2.1 KiB
title, source_url, ingested, sha256
| title | source_url | ingested | sha256 |
|---|---|---|---|
| From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments | https://arxiv.org/abs/2606.04275 | 2026-06-17 | <computed> |
From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments
Authors: Saket Tiwari, Tejas Kotwal, George Konidaris — Brown University, Dept. of Computer Science & Applied Mathematics
Published: ICLR 2026
arXiv: 2606.04275v1 [cs.LG] (2026-06-02)
Abstract
A novel theoretical framework for deep RL in continuous environments, modeling the problem as a continuous-time stochastic process drawing on stochastic control. Introduces a viable model of actor-critic that incorporates both exploration and stochastic transitions. For single-hidden-layer neural networks, the state of the environment can be formulated as a two time-scale process (environment time + gradient time). Using stochastic differential equations, derives — for the first time in continuous RL — an equation describing the infinitesimal change in state distribution at each gradient step under vanishingly small learning rate. Empirically corroborated on a toy LQR continuous control task.
Key Concepts
- continuous-time-rl / stochastic-differential-equation
- wiener-process / ito-calculus
- two-time-scale-process (environment time + gradient time)
- exploratory-dynamics — SDE with policy + environment noise
- linearized-neural-network / neural-tangent-kernel / infinite-width-limit
- martingale-clt / control-affine-mdp
- linear-quadratic-regulator
Key Results
- Closed system of only 5 time-dependent variables describing one-step gradient change
- First equation for gradient-time evolution of state distribution under vanishing step size for NNs
- Nonparametric formulation bridging stochastic control and over-parameterized RL
- Exploratory dynamics outperforms additive Wiener noise in state-action coverage