20260617:目前有914 页
This commit is contained in:
50
concepts/rep-mt-sac.md
Normal file
50
concepts/rep-mt-sac.md
Normal file
@@ -0,0 +1,50 @@
|
||||
---
|
||||
title: "RepMT-SAC"
|
||||
created: 2026-06-17
|
||||
updated: 2026-06-17
|
||||
type: concept
|
||||
tags: [reinforcement-learning, multi-task, representation-learning, algorithm]
|
||||
sources: [raw/papers/naveen-repmt-sac-2026.md]
|
||||
confidence: high
|
||||
---
|
||||
|
||||
# RepMT-SAC: 表征多任务 SAC
|
||||
|
||||
RepMT-SAC 是 [[repmt-sac|Naveen et al. (2026)]] 提出的多任务 RL 算法——在 [[soft-actor-critic|SAC]] 基础上引入 [[spectral-mdp-decomposition|谱 MDP 分解]] 解耦任务不变动力学与任务特定目标。
|
||||
|
||||
## 核心分解
|
||||
|
||||
```
|
||||
Q(s,a;τ) = ⟨φ(s,a), w(τ)⟩
|
||||
```
|
||||
|
||||
- `φ(s,a)`:任务不变表征(共享动力学)
|
||||
- `w(τ)`:任务条件编码(特定奖励)
|
||||
|
||||
## 两阶段
|
||||
|
||||
### 上游(Upstream)
|
||||
- 联合学习 φ, µ(辅助表征)和 w(τ;θ)
|
||||
- TD 目标在 φ 上线性 → 训练极稳定
|
||||
- 最大熵策略 π(a|s,τ) 从线性 Q 导出
|
||||
|
||||
### 下游(Downstream)
|
||||
- φ, µ **冻结**
|
||||
- 仅微调 w(τ_new) 和 π_new
|
||||
- OOD 任务少样本快速适应
|
||||
|
||||
## 关键优势
|
||||
|
||||
| 维度 | 标准 SAC | RepMT-SAC |
|
||||
|------|---------|-----------|
|
||||
| 任务关系 | 独立 | 共享 φ, 特化 w |
|
||||
| Q 学习 | 非线性 | φ 冻结后线性 |
|
||||
| OOD 适应 | 需重训 | 微调少量参数 |
|
||||
| 理论基础 | 无 | 谱分解保证 |
|
||||
|
||||
## 参考
|
||||
|
||||
- [[spectral-mdp-decomposition|谱 MDP 分解]]
|
||||
- [[soft-actor-critic|SAC]]
|
||||
- [[multitask-rl|多任务 RL]]
|
||||
- [[repmt-sac|论文]]
|
||||
Reference in New Issue
Block a user