Files
myWiki/concepts/taylor-expansion-q-function.md

53 lines
1.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Q 函数 Taylor 展开 (Taylor Expansion of Q-Function)"
created: 2026-06-17
updated: 2026-06-17
type: concept
tags: [reinforcement-learning, theory, mdp, taylor-expansion]
sources: [raw/papers/chen-bellman-taylor-score-2026.md]
confidence: high
---
# Q 函数 Taylor 展开
[[bellman-taylor-score-decoding|BTSD]] 框架通过一阶或高阶Taylor 展开最优 Q 函数来建立得分解码的理论基础。
## 一阶展开
在参考后动作点 `x_ref(s) = φ_s(a_ref(s))` 附近展开:
```
Q*(s,a) = ψ_s(a) + γ G*_s(φ_s(a))
≈ const + ψ_s(a) + γ⟨∇G*_s(x_ref), φ_s(a)⟩
```
## 高阶推广
对于 K 阶展开,使用多指数记号 `m = (m1,...,md)`
```
Q*(s,a) ≈ const + ψ_s(a) + Σ_{|m|=1}^{K} γ ∇^m G*_s(x_ref) · φ_s(a)^m / m!
```
高阶项 `φ_s(a)^m` 的**张量积**特征 `φ_s(a)^{⊗m}` 可被 [[action-decoder|解码器]] 用作更丰富的特征表示。
## 理论意义
- **结构近似误差**由 Taylor 余项控制:`ε_approx ∝ |∇^2 G*_s| · ‖φ_s(a) - x_ref‖^2`
- 当 G* 接近线性时(如在许多排队系统中),一阶近似几乎精确
- 当 G* 有显著曲率时,需保留更高阶项
## 优化性能保证
```
|J(π*) - J(π_BTSD)| ≤ ε_approx(G*) + ε_learn(DRL)
```
第一个项仅依赖 Q 函数的固有结构Taylor 余项),第二个项依赖 DRL 算法的学习能力。
## 参考
- [[bellman-taylor-score-decoding|BTSD]]
- [[continuation-value-function|延续价值函数]]
- [[post-action-configuration|后动作配置]]