SidneyZhang/myWiki

Fork 0

Files

Sidney Zhang 91fac5b6fc

20260617:目前有914 页

2026-06-17 15:02:40 +08:00

2.0 KiB

Raw Blame History

title, created, type, paper

title	created	type	paper
Review: Predictive Representations for Scalable Multitask Deep RL	2026-06-10	review	predictive-representations-scalable-mtrl

Review: Predictive Representations for Scalable Multitask Deep RL

📌 基本信息

论文：Representation Learning Enables Scalable Multitask Deep RL
作者：Obando-Ceron, Li, Fujimoto, Bacon, Courville, Castro (Mila / McGill / Google DeepMind)
领域：深度RL × 多任务学习 × 表征学习
arXiv：2606.05555v1 [cs.LG, cs.AI], 2026-06-04

🎯 核心贡献

揭示 Scaling 瓶颈 — 纯 model-free RL 增大模型无收益甚至退化；加入预测表征后持续改善 → 表征质量是 scaling 的真正瓶颈
MR.Q 超越 Newt — model-free + 预测表征（无规划）在所有 10 个 MMBench 域上超越 world-model + 规划的 Newt baseline
澄清 Model-Based 的收益来源 — 规划不是必需的，好处来自预测目标学习的表征

🔗 概念网络

Predictive Representation Learning → MR.Q Algorithm
         ↓                               ↓
Representation Learning in RL → Multitask RL → Deep RL Scaling
         ↓                               ↓
Auxiliary Predictive Objectives    World Models RL → Model-Free RL

📊 Wiki 集成

新增页面：9 个（1 论文 + 8 概念）
链接完整性：100%
总规模：719 → 728 页

💡 关键洞察

这篇论文的价值在于拨开了 model-based RL 的迷雾。Dreamer、TD-MPC2、Newt 等方法声称的好处一直被归因于"学习 world model + 规划"，但 Obando-Ceron et al. 通过精巧的消融设计表明：规划是无关的——真正驱动性能的是预测目标提供的密集表征学习信号。

这对工程实践有直接指导：与其投入计算资源做潜空间 rollout，不如把这些资源用于更好的辅助预测目标。MR.Q 的简单高效（比 Newt 更好的性能 + 更低的 wall-clock 时间）是 KISS 原则在 RL 中的胜利。

2.0 KiB Raw Blame History Unescape Escape

Review: Predictive Representations for Scalable Multitask Deep RL

2.0 KiB

Raw Blame History