Files
myWiki/concepts/predictive-representation-learning.md

51 lines
1.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "预测表征学习 (Predictive Representation Learning)"
created: 2026-06-10
updated: 2026-06-10
type: concept
tags: ["deep-rl", "representation-learning", "self-supervised-learning"]
sources: ["[[predictive-representations-scalable-mtrl]]"]
---
# 预测表征学习 (Predictive Representation Learning)
**预测表征学习**是 [[predictive-representations-scalable-mtrl|Obando-Ceron et al. (2026)]] 的核心论点多任务RL的可扩展性驱动力是学习**对未来状态/奖励有预测力的表征**,而非显式规划。
## 核心直觉
传统RL仅从奖励信号学习表征稀疏、非平稳。预测目标提供**密集的辅助监督**
- 预测下一状态 z_{t+1}
- 预测即时奖励 r_t
- 预测终止信号 d_t
这些目标迫使编码器捕捉环境动力学和任务相关的时序结构。
## 与 Model-Based RL 的关系
| Model-Based RL | 预测表征学习 |
|---------------|------------|
| 学习 world model + 规划 | 学习 world model + 仅用于表征 |
| 潜空间 rollout / MCTS | 无规划 |
| 模型误差会累积 | 模型误差仅影响表征质量 |
| 高计算开销 | 低计算开销 |
## 为什么有效
1. **密集监督**:每个 transition 都有预测目标,而非仅依赖稀疏奖励
2. **表征结构**:迫使潜空间捕捉因果/时序关系
3. **TD 稳定性**:更好的表征减少 TD 方差
4. **跨任务共享**:动力学预测是任务无关的,促进迁移
## 关键实验证据
[[predictive-representations-scalable-mtrl|Obando-Ceron et al.]] 的核心发现:
- PPO 无预测表征 → 模型 scaling 无收益
- PPO + 预测表征 → 持续随规模提升
- MR.Q预测表征 + model-free TD超越 Newtworld model + 规划)
## 参考
- [[predictive-representations-scalable-mtrl|Scalable Multitask Deep RL]]
- [[mrq-algorithm|MR.Q]]
- [[auxiliary-predictive-objectives|Auxiliary Predictive Objectives]]
- [[representation-learning-rl|Representation Learning in RL]]