20260617:目前有914 页

2026-06-17 15:02:40 +08:00
parent e96b955fda
commit 91fac5b6fc
423 changed files with 20687 additions and 34 deletions
--- a/concepts/linear-quadratic-regulator.md
+++ b/concepts/linear-quadratic-regulator.md
@@ -0,0 +1,54 @@
+---
+title: "线性二次调节器 (Linear Quadratic Regulator)"
+created: 2026-06-17
+updated: 2026-06-17
+type: concept
+tags: [control-theory, continuous-control, benchmark, optimization]
+sources: [raw/papers/tiwari-ticks-to-flows-2026.md]
+confidence: high
+---
+
+# 线性二次调节器 (Linear Quadratic Regulator)
+
+LQR 是**最优控制理论中最经典的基准问题**——系统动力学为线性，代价函数为二次型。在 [[ticks-to-flows|Ticks-to-Flows]] 中用作验证实验环境。
+
+## 问题形式
+
+动力学：`s_{t+1} = A s_t + B a_t + noise`
+
+代价（负奖励）：`cost = Σ (s_t^T Q s_t + a_t^T R a_t)`
+
+目标：找到使累积代价最小的线性策略 `a_t = -K s_t`。
+
+## 在实验中的配置
+
+Ticks-to-Flows 使用的简化 LQR：
+
+```
+g(s) = s（自驱动漂移）
+h(s) = 1（动作通道）
+σ(s) = 0.1（小噪声）
+r(s) = -500 s^2（强惩罚偏离原点）
+s_0 = 2.0, T = 1, Δt = 0.02
+```
+
+扩展到多维 ds = 2, 8, 32。
+
+## 为什么选择 LQR
+
+1. **有解析解**：Ricatti 方程给出最优策略
+2. **可验证性**：理论预测可与最优解对比
+3. **线性化兼容**：LQR 本身的线性结构与 NN 的 [[linearized-neural-network|线性化]] 一致
+4. **标度性**：可测试不同状态维度上的扩展性
+
+## 与理论结果的关联
+
+Theorem 6.1 的预测（5 变量封闭系统）在 LQR 上得到经验验证：
+- 理论模型为离散模拟（图 6 中黑色虚线）与经验 actor-critic 轨迹高度吻合
+- 1D 到 32D 均能学到接近最优的策略
+
+## 参考
+
+- [[control-affine-mdp|控制仿射 MDP]]
+- [[ticks-to-flows|Ticks to Flows]]
+- [[continuous-time-rl|连续时间 RL]]