20260617:目前有914 页
This commit is contained in:
52
concepts/taylor-expansion-q-function.md
Normal file
52
concepts/taylor-expansion-q-function.md
Normal file
@@ -0,0 +1,52 @@
|
||||
---
|
||||
title: "Q 函数 Taylor 展开 (Taylor Expansion of Q-Function)"
|
||||
created: 2026-06-17
|
||||
updated: 2026-06-17
|
||||
type: concept
|
||||
tags: [reinforcement-learning, theory, mdp, taylor-expansion]
|
||||
sources: [raw/papers/chen-bellman-taylor-score-2026.md]
|
||||
confidence: high
|
||||
---
|
||||
|
||||
# Q 函数 Taylor 展开
|
||||
|
||||
[[bellman-taylor-score-decoding|BTSD]] 框架通过一阶(或高阶)Taylor 展开最优 Q 函数来建立得分解码的理论基础。
|
||||
|
||||
## 一阶展开
|
||||
|
||||
在参考后动作点 `x_ref(s) = φ_s(a_ref(s))` 附近展开:
|
||||
|
||||
```
|
||||
Q*(s,a) = ψ_s(a) + γ G*_s(φ_s(a))
|
||||
≈ const + ψ_s(a) + γ⟨∇G*_s(x_ref), φ_s(a)⟩
|
||||
```
|
||||
|
||||
## 高阶推广
|
||||
|
||||
对于 K 阶展开,使用多指数记号 `m = (m1,...,md)`:
|
||||
|
||||
```
|
||||
Q*(s,a) ≈ const + ψ_s(a) + Σ_{|m|=1}^{K} γ ∇^m G*_s(x_ref) · φ_s(a)^m / m!
|
||||
```
|
||||
|
||||
高阶项 `φ_s(a)^m` 的**张量积**特征 `φ_s(a)^{⊗m}` 可被 [[action-decoder|解码器]] 用作更丰富的特征表示。
|
||||
|
||||
## 理论意义
|
||||
|
||||
- **结构近似误差**由 Taylor 余项控制:`ε_approx ∝ |∇^2 G*_s| · ‖φ_s(a) - x_ref‖^2`
|
||||
- 当 G* 接近线性时(如在许多排队系统中),一阶近似几乎精确
|
||||
- 当 G* 有显著曲率时,需保留更高阶项
|
||||
|
||||
## 优化性能保证
|
||||
|
||||
```
|
||||
|J(π*) - J(π_BTSD)| ≤ ε_approx(G*) + ε_learn(DRL)
|
||||
```
|
||||
|
||||
第一个项仅依赖 Q 函数的固有结构(Taylor 余项),第二个项依赖 DRL 算法的学习能力。
|
||||
|
||||
## 参考
|
||||
|
||||
- [[bellman-taylor-score-decoding|BTSD]]
|
||||
- [[continuation-value-function|延续价值函数]]
|
||||
- [[post-action-configuration|后动作配置]]
|
||||
Reference in New Issue
Block a user