20260617:目前有914 页
This commit is contained in:
50
concepts/predictive-representation-learning.md
Normal file
50
concepts/predictive-representation-learning.md
Normal file
@@ -0,0 +1,50 @@
|
||||
---
|
||||
title: "预测表征学习 (Predictive Representation Learning)"
|
||||
created: 2026-06-10
|
||||
updated: 2026-06-10
|
||||
type: concept
|
||||
tags: ["deep-rl", "representation-learning", "self-supervised-learning"]
|
||||
sources: ["[[predictive-representations-scalable-mtrl]]"]
|
||||
---
|
||||
|
||||
# 预测表征学习 (Predictive Representation Learning)
|
||||
|
||||
**预测表征学习**是 [[predictive-representations-scalable-mtrl|Obando-Ceron et al. (2026)]] 的核心论点:多任务RL的可扩展性驱动力是学习**对未来状态/奖励有预测力的表征**,而非显式规划。
|
||||
|
||||
## 核心直觉
|
||||
|
||||
传统RL仅从奖励信号学习表征(稀疏、非平稳)。预测目标提供**密集的辅助监督**:
|
||||
- 预测下一状态 z_{t+1}
|
||||
- 预测即时奖励 r_t
|
||||
- 预测终止信号 d_t
|
||||
|
||||
这些目标迫使编码器捕捉环境动力学和任务相关的时序结构。
|
||||
|
||||
## 与 Model-Based RL 的关系
|
||||
|
||||
| Model-Based RL | 预测表征学习 |
|
||||
|---------------|------------|
|
||||
| 学习 world model + 规划 | 学习 world model + 仅用于表征 |
|
||||
| 潜空间 rollout / MCTS | 无规划 |
|
||||
| 模型误差会累积 | 模型误差仅影响表征质量 |
|
||||
| 高计算开销 | 低计算开销 |
|
||||
|
||||
## 为什么有效
|
||||
|
||||
1. **密集监督**:每个 transition 都有预测目标,而非仅依赖稀疏奖励
|
||||
2. **表征结构**:迫使潜空间捕捉因果/时序关系
|
||||
3. **TD 稳定性**:更好的表征减少 TD 方差
|
||||
4. **跨任务共享**:动力学预测是任务无关的,促进迁移
|
||||
|
||||
## 关键实验证据
|
||||
|
||||
[[predictive-representations-scalable-mtrl|Obando-Ceron et al.]] 的核心发现:
|
||||
- PPO 无预测表征 → 模型 scaling 无收益
|
||||
- PPO + 预测表征 → 持续随规模提升
|
||||
- MR.Q(预测表征 + model-free TD)超越 Newt(world model + 规划)
|
||||
|
||||
## 参考
|
||||
- [[predictive-representations-scalable-mtrl|Scalable Multitask Deep RL]]
|
||||
- [[mrq-algorithm|MR.Q]]
|
||||
- [[auxiliary-predictive-objectives|Auxiliary Predictive Objectives]]
|
||||
- [[representation-learning-rl|Representation Learning in RL]]
|
||||
Reference in New Issue
Block a user