36 lines
2.1 KiB
Markdown
36 lines
2.1 KiB
Markdown
---
|
|
title: "From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments"
|
|
source_url: https://arxiv.org/abs/2606.04275
|
|
ingested: 2026-06-17
|
|
sha256: <computed>
|
|
---
|
|
|
|
# From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments
|
|
|
|
**Authors:** Saket Tiwari, Tejas Kotwal, George Konidaris — Brown University, Dept. of Computer Science & Applied Mathematics
|
|
|
|
**Published:** ICLR 2026
|
|
|
|
**arXiv:** 2606.04275v1 [cs.LG] (2026-06-02)
|
|
|
|
## Abstract
|
|
|
|
A novel theoretical framework for deep RL in continuous environments, modeling the problem as a continuous-time stochastic process drawing on stochastic control. Introduces a viable model of actor-critic that incorporates both exploration and stochastic transitions. For single-hidden-layer neural networks, the state of the environment can be formulated as a two time-scale process (environment time + gradient time). Using stochastic differential equations, derives — for the first time in continuous RL — an equation describing the infinitesimal change in state distribution at each gradient step under vanishingly small learning rate. Empirically corroborated on a toy LQR continuous control task.
|
|
|
|
## Key Concepts
|
|
|
|
- [[continuous-time-rl|连续时间强化学习]] / [[stochastic-differential-equation|随机微分方程]]
|
|
- [[wiener-process|维纳过程]] / [[ito-calculus|Itô 微积分]]
|
|
- [[two-time-scale-process|双时间尺度过程]] (environment time + gradient time)
|
|
- [[exploratory-dynamics|探索动力学]] — SDE with policy + environment noise
|
|
- [[linearized-neural-network|线性化神经网络]] / [[neural-tangent-kernel|NTK]] / [[infinite-width-limit|无限宽度极限]]
|
|
- [[martingale-clt|鞅中心极限定理]] / [[control-affine-mdp|控制仿射 MDP]]
|
|
- [[linear-quadratic-regulator|LQR]]
|
|
|
|
## Key Results
|
|
|
|
- Closed system of only 5 time-dependent variables describing one-step gradient change
|
|
- First equation for gradient-time evolution of state distribution under vanishing step size for NNs
|
|
- Nonparametric formulation bridging stochastic control and over-parameterized RL
|
|
- Exploratory dynamics outperforms additive Wiener noise in state-action coverage
|