20260617:目前有914 页

2026-06-17 15:02:40 +08:00
parent e96b955fda
commit 91fac5b6fc
423 changed files with 20687 additions and 34 deletions
--- a/concepts/control-affine-mdp.md
+++ b/concepts/control-affine-mdp.md
@@ -0,0 +1,57 @@
+---
+title: "控制仿射 MDP (Control-Affine MDP)"
+created: 2026-06-17
+updated: 2026-06-17
+type: concept
+tags: [reinforcement-learning, control-theory, theory]
+sources: [raw/papers/tiwari-ticks-to-flows-2026.md]
+confidence: high
+---
+
+# 控制仿射 MDP (Control-Affine MDP)
+
+控制仿射 MDP 是 [[ticks-to-flows|Ticks-to-Flows]] 定义的**连续时间、连续状态-动作空间的 MDP**——其中动作对动力学的影响是**线性（仿射）**的，但环境和奖励可以是高度非线性的。
+
+## 形式化定义
+
+M = (S, A, ⟨g, h, σ⟩, r, s₀, β)，其中：
+
+```
+ds_t = (g(s_t) + h(s_t) a_t) dt + σ(s_t) dW_t
+```
+
+- `g: R^{ds} → R^{ds}`：自治动力学（不受控的 drift）
+- `h: R^{ds} → R^{ds×da}`：**控制仿射项**（动作线性进入动力学）
+- `σ: R^{ds} → R^{ds×ds}`：环境噪声（与动作无关）
+- `r: R^{ds} → R`：光滑奖励函数
+- β ∈ (0,1)：折扣因子
+
+## "控制仿射"的含义
+
+动力学中动作 `a_t` 以**线性**方式出现（通过 `h(s_t)a_t`），但 `g`, `h`, `σ`, `r` 都可以是**非线性光滑函数**。这种结构：
+
+- 比一般非线性控制更容易分析
+- 涵盖了绝大多数物理控制问题
+- 使得探索动力学分析更易处理
+
+## 假设条件
+
+- **光滑性**：g, h, σ, r 无限可微
+- **Lipschitz 连续性**：保证 SDE 解的存在唯一性
+- **策略可允许性**：策略需光滑 + Lipschitz（保证封闭系统 SDE 的适定性）
+
+## 与离散 MDP 的对比
+
+| 维度 | 标准 MDP | 控制仿射 MDP |
+|------|---------|-------------|
+| 时间 | 离散 t=0,1,2... | 连续 t∈[0,T) |
+| 转移 | P(s'|s,a) | SDE ds/dt |
+| 奖励 | r(s,a) | r(s) (状态依赖) |
+| 控制结构 | 任意 | 仿射 (g + h·a) |
+
+## 参考
+
+- [[continuous-time-rl|连续时间 RL]]
+- [[stochastic-differential-equation|SDE]]
+- [[linear-quadratic-regulator|LQR]]
+- [[ticks-to-flows|Ticks to Flows]]