20260617:目前有914 页
This commit is contained in:
54
concepts/linear-quadratic-regulator.md
Normal file
54
concepts/linear-quadratic-regulator.md
Normal file
@@ -0,0 +1,54 @@
|
||||
---
|
||||
title: "线性二次调节器 (Linear Quadratic Regulator)"
|
||||
created: 2026-06-17
|
||||
updated: 2026-06-17
|
||||
type: concept
|
||||
tags: [control-theory, continuous-control, benchmark, optimization]
|
||||
sources: [raw/papers/tiwari-ticks-to-flows-2026.md]
|
||||
confidence: high
|
||||
---
|
||||
|
||||
# 线性二次调节器 (Linear Quadratic Regulator)
|
||||
|
||||
LQR 是**最优控制理论中最经典的基准问题**——系统动力学为线性,代价函数为二次型。在 [[ticks-to-flows|Ticks-to-Flows]] 中用作验证实验环境。
|
||||
|
||||
## 问题形式
|
||||
|
||||
动力学:`s_{t+1} = A s_t + B a_t + noise`
|
||||
|
||||
代价(负奖励):`cost = Σ (s_t^T Q s_t + a_t^T R a_t)`
|
||||
|
||||
目标:找到使累积代价最小的线性策略 `a_t = -K s_t`。
|
||||
|
||||
## 在实验中的配置
|
||||
|
||||
Ticks-to-Flows 使用的简化 LQR:
|
||||
|
||||
```
|
||||
g(s) = s(自驱动漂移)
|
||||
h(s) = 1(动作通道)
|
||||
σ(s) = 0.1(小噪声)
|
||||
r(s) = -500 s^2(强惩罚偏离原点)
|
||||
s_0 = 2.0, T = 1, Δt = 0.02
|
||||
```
|
||||
|
||||
扩展到多维 ds = 2, 8, 32。
|
||||
|
||||
## 为什么选择 LQR
|
||||
|
||||
1. **有解析解**:Ricatti 方程给出最优策略
|
||||
2. **可验证性**:理论预测可与最优解对比
|
||||
3. **线性化兼容**:LQR 本身的线性结构与 NN 的 [[linearized-neural-network|线性化]] 一致
|
||||
4. **标度性**:可测试不同状态维度上的扩展性
|
||||
|
||||
## 与理论结果的关联
|
||||
|
||||
Theorem 6.1 的预测(5 变量封闭系统)在 LQR 上得到经验验证:
|
||||
- 理论模型为离散模拟(图 6 中黑色虚线)与经验 actor-critic 轨迹高度吻合
|
||||
- 1D 到 32D 均能学到接近最优的策略
|
||||
|
||||
## 参考
|
||||
|
||||
- [[control-affine-mdp|控制仿射 MDP]]
|
||||
- [[ticks-to-flows|Ticks to Flows]]
|
||||
- [[continuous-time-rl|连续时间 RL]]
|
||||
Reference in New Issue
Block a user