Files
myWiki/reviews/predictive-representations-mtrl-20260610.md

44 lines
2.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Review: Predictive Representations for Scalable Multitask Deep RL"
created: 2026-06-10
type: review
paper: "[[predictive-representations-scalable-mtrl]]"
---
# Review: Predictive Representations for Scalable Multitask Deep RL
📌 **基本信息**
- 论文Representation Learning Enables Scalable Multitask Deep RL
- 作者Obando-Ceron, Li, Fujimoto, Bacon, Courville, Castro (Mila / McGill / Google DeepMind)
- 领域深度RL × 多任务学习 × 表征学习
- arXiv2606.05555v1 [cs.LG, cs.AI], 2026-06-04
🎯 **核心贡献**
1. **揭示 Scaling 瓶颈** — 纯 model-free RL 增大模型无收益甚至退化;加入预测表征后持续改善 → 表征质量是 scaling 的真正瓶颈
2. **MR.Q 超越 Newt** — model-free + 预测表征(无规划)在所有 10 个 MMBench 域上超越 world-model + 规划的 Newt baseline
3. **澄清 Model-Based 的收益来源** — 规划不是必需的,好处来自预测目标学习的表征
🔗 **概念网络**
```
Predictive Representation Learning → MR.Q Algorithm
↓ ↓
Representation Learning in RL → Multitask RL → Deep RL Scaling
↓ ↓
Auxiliary Predictive Objectives World Models RL → Model-Free RL
```
📊 **Wiki 集成**
- 新增页面9 个1 论文 + 8 概念)
- 链接完整性100%
- 总规模719 → **728**
💡 **关键洞察**
这篇论文的价值在于**拨开了 model-based RL 的迷雾**。Dreamer、TD-MPC2、Newt 等方法声称的好处一直被归因于"学习 world model + 规划",但 Obando-Ceron et al. 通过精巧的消融设计表明:**规划是无关的**——真正驱动性能的是预测目标提供的密集表征学习信号。
这对工程实践有直接指导:与其投入计算资源做潜空间 rollout不如把这些资源用于更好的辅助预测目标。MR.Q 的简单高效(比 Newt 更好的性能 + 更低的 wall-clock 时间)是 KISS 原则在 RL 中的胜利。