20260617:目前有914 页

2026-06-17 15:02:40 +08:00
parent e96b955fda
commit 91fac5b6fc
423 changed files with 20687 additions and 34 deletions
--- a/concepts/predictive-representation-learning.md
+++ b/concepts/predictive-representation-learning.md
@@ -0,0 +1,50 @@
+---
+title: "预测表征学习 (Predictive Representation Learning)"
+created: 2026-06-10
+updated: 2026-06-10
+type: concept
+tags: ["deep-rl", "representation-learning", "self-supervised-learning"]
+sources: ["[[predictive-representations-scalable-mtrl]]"]
+---
+
+# 预测表征学习 (Predictive Representation Learning)
+
+**预测表征学习**是 [[predictive-representations-scalable-mtrl|Obando-Ceron et al. (2026)]] 的核心论点：多任务RL的可扩展性驱动力是学习**对未来状态/奖励有预测力的表征**，而非显式规划。
+
+## 核心直觉
+
+传统RL仅从奖励信号学习表征（稀疏、非平稳）。预测目标提供**密集的辅助监督**：
+- 预测下一状态 z_{t+1}
+- 预测即时奖励 r_t
+- 预测终止信号 d_t
+
+这些目标迫使编码器捕捉环境动力学和任务相关的时序结构。
+
+## 与 Model-Based RL 的关系
+
+| Model-Based RL | 预测表征学习 |
+|---------------|------------|
+| 学习 world model + 规划 | 学习 world model + 仅用于表征 |
+| 潜空间 rollout / MCTS | 无规划 |
+| 模型误差会累积 | 模型误差仅影响表征质量 |
+| 高计算开销 | 低计算开销 |
+
+## 为什么有效
+
+1. **密集监督**：每个 transition 都有预测目标，而非仅依赖稀疏奖励
+2. **表征结构**：迫使潜空间捕捉因果/时序关系
+3. **TD 稳定性**：更好的表征减少 TD 方差
+4. **跨任务共享**：动力学预测是任务无关的，促进迁移
+
+## 关键实验证据
+
+[[predictive-representations-scalable-mtrl|Obando-Ceron et al.]] 的核心发现：
+- PPO 无预测表征 → 模型 scaling 无收益
+- PPO + 预测表征 → 持续随规模提升
+- MR.Q（预测表征 + model-free TD）超越 Newt（world model + 规划）
+
+## 参考
+- [[predictive-representations-scalable-mtrl|Scalable Multitask Deep RL]]
+- [[mrq-algorithm|MR.Q]]
+- [[auxiliary-predictive-objectives|Auxiliary Predictive Objectives]]
+- [[representation-learning-rl|Representation Learning in RL]]