20260617:目前有914 页

2026-06-17 15:02:40 +08:00
parent e96b955fda
commit 91fac5b6fc
423 changed files with 20687 additions and 34 deletions
--- a/raw/papers/tiwari-ticks-to-flows-2026.md
+++ b/raw/papers/tiwari-ticks-to-flows-2026.md
@@ -0,0 +1,35 @@
+---
+title: "From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments"
+source_url: https://arxiv.org/abs/2606.04275
+ingested: 2026-06-17
+sha256: <computed>
+---
+
+# From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments
+
+**Authors:** Saket Tiwari, Tejas Kotwal, George Konidaris — Brown University, Dept. of Computer Science & Applied Mathematics
+
+**Published:** ICLR 2026
+
+**arXiv:** 2606.04275v1 [cs.LG] (2026-06-02)
+
+## Abstract
+
+A novel theoretical framework for deep RL in continuous environments, modeling the problem as a continuous-time stochastic process drawing on stochastic control. Introduces a viable model of actor-critic that incorporates both exploration and stochastic transitions. For single-hidden-layer neural networks, the state of the environment can be formulated as a two time-scale process (environment time + gradient time). Using stochastic differential equations, derives — for the first time in continuous RL — an equation describing the infinitesimal change in state distribution at each gradient step under vanishingly small learning rate. Empirically corroborated on a toy LQR continuous control task.
+
+## Key Concepts
+
+- [[continuous-time-rl|连续时间强化学习]] / [[stochastic-differential-equation|随机微分方程]]
+- [[wiener-process|维纳过程]] / [[ito-calculus|Itô 微积分]]
+- [[two-time-scale-process|双时间尺度过程]] (environment time + gradient time)
+- [[exploratory-dynamics|探索动力学]] — SDE with policy + environment noise
+- [[linearized-neural-network|线性化神经网络]] / [[neural-tangent-kernel|NTK]] / [[infinite-width-limit|无限宽度极限]]
+- [[martingale-clt|鞅中心极限定理]] / [[control-affine-mdp|控制仿射 MDP]]
+- [[linear-quadratic-regulator|LQR]]
+
+## Key Results
+
+- Closed system of only 5 time-dependent variables describing one-step gradient change
+- First equation for gradient-time evolution of state distribution under vanishing step size for NNs
+- Nonparametric formulation bridging stochastic control and over-parameterized RL
+- Exploratory dynamics outperforms additive Wiener noise in state-action coverage