20260617:目前有914 页
This commit is contained in:
35
raw/papers/tiwari-ticks-to-flows-2026.md
Normal file
35
raw/papers/tiwari-ticks-to-flows-2026.md
Normal file
@@ -0,0 +1,35 @@
|
||||
---
|
||||
title: "From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments"
|
||||
source_url: https://arxiv.org/abs/2606.04275
|
||||
ingested: 2026-06-17
|
||||
sha256: <computed>
|
||||
---
|
||||
|
||||
# From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments
|
||||
|
||||
**Authors:** Saket Tiwari, Tejas Kotwal, George Konidaris — Brown University, Dept. of Computer Science & Applied Mathematics
|
||||
|
||||
**Published:** ICLR 2026
|
||||
|
||||
**arXiv:** 2606.04275v1 [cs.LG] (2026-06-02)
|
||||
|
||||
## Abstract
|
||||
|
||||
A novel theoretical framework for deep RL in continuous environments, modeling the problem as a continuous-time stochastic process drawing on stochastic control. Introduces a viable model of actor-critic that incorporates both exploration and stochastic transitions. For single-hidden-layer neural networks, the state of the environment can be formulated as a two time-scale process (environment time + gradient time). Using stochastic differential equations, derives — for the first time in continuous RL — an equation describing the infinitesimal change in state distribution at each gradient step under vanishingly small learning rate. Empirically corroborated on a toy LQR continuous control task.
|
||||
|
||||
## Key Concepts
|
||||
|
||||
- [[continuous-time-rl|连续时间强化学习]] / [[stochastic-differential-equation|随机微分方程]]
|
||||
- [[wiener-process|维纳过程]] / [[ito-calculus|Itô 微积分]]
|
||||
- [[two-time-scale-process|双时间尺度过程]] (environment time + gradient time)
|
||||
- [[exploratory-dynamics|探索动力学]] — SDE with policy + environment noise
|
||||
- [[linearized-neural-network|线性化神经网络]] / [[neural-tangent-kernel|NTK]] / [[infinite-width-limit|无限宽度极限]]
|
||||
- [[martingale-clt|鞅中心极限定理]] / [[control-affine-mdp|控制仿射 MDP]]
|
||||
- [[linear-quadratic-regulator|LQR]]
|
||||
|
||||
## Key Results
|
||||
|
||||
- Closed system of only 5 time-dependent variables describing one-step gradient change
|
||||
- First equation for gradient-time evolution of state distribution under vanishing step size for NNs
|
||||
- Nonparametric formulation bridging stochastic control and over-parameterized RL
|
||||
- Exploratory dynamics outperforms additive Wiener noise in state-action coverage
|
||||
Reference in New Issue
Block a user