20260617:目前有914 页

2026-06-17 15:02:40 +08:00
parent e96b955fda
commit 91fac5b6fc
423 changed files with 20687 additions and 34 deletions
--- a/concepts/spectral-mdp-decomposition.md
+++ b/concepts/spectral-mdp-decomposition.md
@@ -0,0 +1,48 @@
+---
+title: "谱 MDP 分解 (Spectral MDP Decomposition)"
+created: 2026-06-17
+updated: 2026-06-17
+type: concept
+tags: [reinforcement-learning, theory, representation-learning, mdp]
+sources: [raw/papers/naveen-repmt-sac-2026.md]
+confidence: high
+---
+
+# 谱 MDP 分解 (Spectral MDP Decomposition)
+
+谱 MDP 分解将 MDP 的奖励函数和 Q 函数表示为**特征映射 φ 的线性组合**。[[repmt-sac|RepMT-SAC]] 将其推广到多任务设置——φ 任务不变，权重 w 任务特定。
+
+## 定义
+
+MDP 允许谱分解，若存在：
+
+```
+r(s,a,τ) = ⟨φ(s,a), θ(τ)⟩    （奖励分解）
+Q^π(s,a;τ) = ⟨φ(s,a), w^π(τ)⟩   （Q 函数分解）
+```
+
+## 关键推广
+
+传统谱分解（如 CTRL）假设 w 是固定向量。RepMT-SAC 将 w(τ) 推广为**任务的显式函数**：
+
+| 维度 | 单任务谱分解 | 多任务推广 |
+|------|------------|----------|
+| φ(s,a) | 任务特定 | 任务不变 |
+| w | 固定向量 | w(τ) 显式依赖任务 |
+| 泛化 | 无 | 零样本 + 少样本 |
+
+## 学习方式
+
+使用**谱条件密度估计**近似学习 φ 和 µ(s')：
+
+```
+min_{φ,µ} -E[ log (exp⟨φ(s,a),µ(s')⟩ / Σ exp⟨φ(s,a),µ(s'')⟩) ]
+```
+
+类似对比学习的 softmax 交叉熵，使 φ 和 µ 的内积近似转移密度 P(s'|s,a)。
+
+## 参考
+
+- [[rep-mt-sac|RepMT-SAC]]
+- [[task-invariant-representation|任务不变表征]]
+- [[multitask-rl|多任务 RL]]