Files
myWiki/reviews/repmt-sac-review-20260617.md

46 lines
1.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "RepMT-SAC 论文集成 Review"
created: 2026-06-17
type: review
---
# 📌 基本信息
- **论文**Learning to Adapt: Representation-Based RL for Multi-Task Skill Transfer
- **作者**Aryan Naveen (MIT), Haitong Ma, Haldun Balim, Na Li — Harvard SEAS
- **领域**cs.RO / Multi-Task RL
- **arXiv**2606.12890v1 (2026-06-11)
# 🎯 核心概念
1. **[[spectral-mdp-decomposition|谱 MDP 分解]]** — Q(s,a;τ) = ⟨φ(s,a), w(τ)⟩,φ 任务不变w 任务特定
2. **[[task-invariant-representation|任务不变表征]]** — 对比式条件密度估计学习共享动力学
3. **[[rep-mt-sac|RepMT-SAC]]** — 两阶段 SAC上游学 φ,下游冻 φ 微调 w
4. **[[quadrotor-trajectory-following|四旋翼轨迹跟踪]]** — Legendre 多项式参数化的物理验证
# 🔗 概念网络
```
Spectral MDP Decomposition → Task-Invariant Repr (φ)
↓ ↓
Task Distribution (µ) → RepMT-SAC ← Soft Actor-Critic
↓ ↓
Task-Conditioned Policy → Upstream-Downstream Learning
Quadrotor Trajectory Following
```
**关联已有知识**:通过 [[multitask-rl]] 和 [[few-shot-learning]] 与已有 wiki 概念连接。
# 📚 Wiki 集成
- **新增页面**10 个1 论文 + 8 概念 + 1 raw
- **总规模**892 → 901 页(+9
- 新覆盖cs.RO / 机器人控制
# 💡 关键洞察
1. **φ 冻结后 Q 学习变成线性回归**是 RepMT-SAC 最优雅的工程特性——下游适应极快且极稳定,避免了深层 RL 在新任务上常见的训练不稳定。
2. **谱分解的推广是 subtle 但重要的**:将 w 从"固定向量"提升为"任务的显式函数" w(τ),使表示真正多任务化而不只是多任务共享参数。