Files
myWiki/concepts/linear-quadratic-regulator.md

55 lines
1.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "线性二次调节器 (Linear Quadratic Regulator)"
created: 2026-06-17
updated: 2026-06-17
type: concept
tags: [control-theory, continuous-control, benchmark, optimization]
sources: [raw/papers/tiwari-ticks-to-flows-2026.md]
confidence: high
---
# 线性二次调节器 (Linear Quadratic Regulator)
LQR 是**最优控制理论中最经典的基准问题**——系统动力学为线性,代价函数为二次型。在 [[ticks-to-flows|Ticks-to-Flows]] 中用作验证实验环境。
## 问题形式
动力学:`s_{t+1} = A s_t + B a_t + noise`
代价(负奖励):`cost = Σ (s_t^T Q s_t + a_t^T R a_t)`
目标:找到使累积代价最小的线性策略 `a_t = -K s_t`
## 在实验中的配置
Ticks-to-Flows 使用的简化 LQR
```
g(s) = s自驱动漂移
h(s) = 1动作通道
σ(s) = 0.1(小噪声)
r(s) = -500 s^2强惩罚偏离原点
s_0 = 2.0, T = 1, Δt = 0.02
```
扩展到多维 ds = 2, 8, 32。
## 为什么选择 LQR
1. **有解析解**Ricatti 方程给出最优策略
2. **可验证性**:理论预测可与最优解对比
3. **线性化兼容**LQR 本身的线性结构与 NN 的 [[linearized-neural-network|线性化]] 一致
4. **标度性**:可测试不同状态维度上的扩展性
## 与理论结果的关联
Theorem 6.1 的预测5 变量封闭系统)在 LQR 上得到经验验证:
- 理论模型为离散模拟(图 6 中黑色虚线)与经验 actor-critic 轨迹高度吻合
- 1D 到 32D 均能学到接近最优的策略
## 参考
- [[control-affine-mdp|控制仿射 MDP]]
- [[ticks-to-flows|Ticks to Flows]]
- [[continuous-time-rl|连续时间 RL]]