20260617:目前有914 页

2026-06-17 15:02:40 +08:00
parent e96b955fda
commit 91fac5b6fc
423 changed files with 20687 additions and 34 deletions
--- a/concepts/taylor-expansion-q-function.md
+++ b/concepts/taylor-expansion-q-function.md
@@ -0,0 +1,52 @@
+---
+title: "Q 函数 Taylor 展开 (Taylor Expansion of Q-Function)"
+created: 2026-06-17
+updated: 2026-06-17
+type: concept
+tags: [reinforcement-learning, theory, mdp, taylor-expansion]
+sources: [raw/papers/chen-bellman-taylor-score-2026.md]
+confidence: high
+---
+
+# Q 函数 Taylor 展开
+
+[[bellman-taylor-score-decoding|BTSD]] 框架通过一阶（或高阶）Taylor 展开最优 Q 函数来建立得分解码的理论基础。
+
+## 一阶展开
+
+在参考后动作点 `x_ref(s) = φ_s(a_ref(s))` 附近展开：
+
+```
+Q*(s,a) = ψ_s(a) + γ G*_s(φ_s(a))
+         ≈ const + ψ_s(a) + γ⟨∇G*_s(x_ref), φ_s(a)⟩
+```
+
+## 高阶推广
+
+对于 K 阶展开，使用多指数记号 `m = (m1,...,md)`：
+
+```
+Q*(s,a) ≈ const + ψ_s(a) + Σ_{|m|=1}^{K} γ ∇^m G*_s(x_ref) · φ_s(a)^m / m!
+```
+
+高阶项 `φ_s(a)^m` 的**张量积**特征 `φ_s(a)^{⊗m}` 可被 [[action-decoder|解码器]] 用作更丰富的特征表示。
+
+## 理论意义
+
+- **结构近似误差**由 Taylor 余项控制：`ε_approx ∝ |∇^2 G*_s| · ‖φ_s(a) - x_ref‖^2`
+- 当 G* 接近线性时（如在许多排队系统中），一阶近似几乎精确
+- 当 G* 有显著曲率时，需保留更高阶项
+
+## 优化性能保证
+
+```
+|J(π*) - J(π_BTSD)| ≤ ε_approx(G*) + ε_learn(DRL)
+```
+
+第一个项仅依赖 Q 函数的固有结构（Taylor 余项），第二个项依赖 DRL 算法的学习能力。
+
+## 参考
+
+- [[bellman-taylor-score-decoding|BTSD]]
+- [[continuation-value-function|延续价值函数]]
+- [[post-action-configuration|后动作配置]]