SidneyZhang/myWiki

Fork 0

Files

Sidney Zhang 91fac5b6fc

20260617:目前有914 页

2026-06-17 15:02:40 +08:00

2.1 KiB

Raw Blame History

title, source_url, ingested, sha256

title	source_url	ingested	sha256
From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments	https://arxiv.org/abs/2606.04275	2026-06-17	<computed>

From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments

Authors: Saket Tiwari, Tejas Kotwal, George Konidaris — Brown University, Dept. of Computer Science & Applied Mathematics

Published: ICLR 2026

arXiv: 2606.04275v1 [cs.LG] (2026-06-02)

Abstract

A novel theoretical framework for deep RL in continuous environments, modeling the problem as a continuous-time stochastic process drawing on stochastic control. Introduces a viable model of actor-critic that incorporates both exploration and stochastic transitions. For single-hidden-layer neural networks, the state of the environment can be formulated as a two time-scale process (environment time + gradient time). Using stochastic differential equations, derives — for the first time in continuous RL — an equation describing the infinitesimal change in state distribution at each gradient step under vanishingly small learning rate. Empirically corroborated on a toy LQR continuous control task.

Key Concepts

continuous-time-rl / stochastic-differential-equation
wiener-process / ito-calculus
two-time-scale-process (environment time + gradient time)
exploratory-dynamics — SDE with policy + environment noise
linearized-neural-network / neural-tangent-kernel / infinite-width-limit
martingale-clt / control-affine-mdp
linear-quadratic-regulator

Key Results

Closed system of only 5 time-dependent variables describing one-step gradient change
First equation for gradient-time evolution of state distribution under vanishing step size for NNs
Nonparametric formulation bridging stochastic control and over-parameterized RL
Exploratory dynamics outperforms additive Wiener noise in state-action coverage

2.1 KiB Raw Blame History

From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments

Abstract

Key Concepts

Key Results

2.1 KiB

Raw Blame History