20260617:目前有914 页
This commit is contained in:
48
concepts/spectral-mdp-decomposition.md
Normal file
48
concepts/spectral-mdp-decomposition.md
Normal file
@@ -0,0 +1,48 @@
|
||||
---
|
||||
title: "谱 MDP 分解 (Spectral MDP Decomposition)"
|
||||
created: 2026-06-17
|
||||
updated: 2026-06-17
|
||||
type: concept
|
||||
tags: [reinforcement-learning, theory, representation-learning, mdp]
|
||||
sources: [raw/papers/naveen-repmt-sac-2026.md]
|
||||
confidence: high
|
||||
---
|
||||
|
||||
# 谱 MDP 分解 (Spectral MDP Decomposition)
|
||||
|
||||
谱 MDP 分解将 MDP 的奖励函数和 Q 函数表示为**特征映射 φ 的线性组合**。[[repmt-sac|RepMT-SAC]] 将其推广到多任务设置——φ 任务不变,权重 w 任务特定。
|
||||
|
||||
## 定义
|
||||
|
||||
MDP 允许谱分解,若存在:
|
||||
|
||||
```
|
||||
r(s,a,τ) = ⟨φ(s,a), θ(τ)⟩ (奖励分解)
|
||||
Q^π(s,a;τ) = ⟨φ(s,a), w^π(τ)⟩ (Q 函数分解)
|
||||
```
|
||||
|
||||
## 关键推广
|
||||
|
||||
传统谱分解(如 CTRL)假设 w 是固定向量。RepMT-SAC 将 w(τ) 推广为**任务的显式函数**:
|
||||
|
||||
| 维度 | 单任务谱分解 | 多任务推广 |
|
||||
|------|------------|----------|
|
||||
| φ(s,a) | 任务特定 | 任务不变 |
|
||||
| w | 固定向量 | w(τ) 显式依赖任务 |
|
||||
| 泛化 | 无 | 零样本 + 少样本 |
|
||||
|
||||
## 学习方式
|
||||
|
||||
使用**谱条件密度估计**近似学习 φ 和 µ(s'):
|
||||
|
||||
```
|
||||
min_{φ,µ} -E[ log (exp⟨φ(s,a),µ(s')⟩ / Σ exp⟨φ(s,a),µ(s'')⟩) ]
|
||||
```
|
||||
|
||||
类似对比学习的 softmax 交叉熵,使 φ 和 µ 的内积近似转移密度 P(s'|s,a)。
|
||||
|
||||
## 参考
|
||||
|
||||
- [[rep-mt-sac|RepMT-SAC]]
|
||||
- [[task-invariant-representation|任务不变表征]]
|
||||
- [[multitask-rl|多任务 RL]]
|
||||
Reference in New Issue
Block a user