myWiki/raw/papers/tiwari-ticks-to-flows-2026.md

---
title: "From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments"
source_url: https://arxiv.org/abs/2606.04275
ingested: 2026-06-17
sha256: <computed>
---

# From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments

**Authors:** Saket Tiwari, Tejas Kotwal, George Konidaris — Brown University, Dept. of Computer Science & Applied Mathematics

**Published:** ICLR 2026

**arXiv:** 2606.04275v1 [cs.LG] (2026-06-02)

## Abstract

A novel theoretical framework for deep RL in continuous environments, modeling the problem as a continuous-time stochastic process drawing on stochastic control. Introduces a viable model of actor-critic that incorporates both exploration and stochastic transitions. For single-hidden-layer neural networks, the state of the environment can be formulated as a two time-scale process (environment time + gradient time). Using stochastic differential equations, derives — for the first time in continuous RL — an equation describing the infinitesimal change in state distribution at each gradient step under vanishingly small learning rate. Empirically corroborated on a toy LQR continuous control task.

## Key Concepts

- [[continuous-time-rl|连续时间强化学习]] / [[stochastic-differential-equation|随机微分方程]]
- [[wiener-process|维纳过程]] / [[ito-calculus|Itô 微积分]]
- [[two-time-scale-process|双时间尺度过程]] (environment time + gradient time)
- [[exploratory-dynamics|探索动力学]] — SDE with policy + environment noise
- [[linearized-neural-network|线性化神经网络]] / [[neural-tangent-kernel|NTK]] / [[infinite-width-limit|无限宽度极限]]
- [[martingale-clt|鞅中心极限定理]] / [[control-affine-mdp|控制仿射 MDP]]
- [[linear-quadratic-regulator|LQR]]

## Key Results

- Closed system of only 5 time-dependent variables describing one-step gradient change
- First equation for gradient-time evolution of state distribution under vanishing step size for NNs
- Nonparametric formulation bridging stochastic control and over-parameterized RL
- Exploratory dynamics outperforms additive Wiener noise in state-action coverage