SidneyZhang/myWiki

Files

Sidney Zhang 91fac5b6fc

20260617:目前有914 页

2026-06-17 15:02:40 +08:00

1.9 KiB

Raw Blame History

title, created, updated, type, tags, sources

title

created

updated

type

tags

sources

预测表征学习 (Predictive Representation Learning)

2026-06-10

2026-06-10

concept

deep-rl

representation-learning

self-supervised-learning

predictive-representations-scalable-mtrl

预测表征学习 (Predictive Representation Learning)

预测表征学习是 predictive-representations-scalable-mtrl 的核心论点：多任务RL的可扩展性驱动力是学习对未来状态/奖励有预测力的表征，而非显式规划。

核心直觉

传统RL仅从奖励信号学习表征（稀疏、非平稳）。预测目标提供密集的辅助监督：

预测下一状态 z_{t+1}
预测即时奖励 r_t
预测终止信号 d_t

这些目标迫使编码器捕捉环境动力学和任务相关的时序结构。

与 Model-Based RL 的关系

Model-Based RL	预测表征学习
学习 world model + 规划	学习 world model + 仅用于表征
潜空间 rollout / MCTS	无规划
模型误差会累积	模型误差仅影响表征质量
高计算开销	低计算开销

为什么有效

密集监督：每个 transition 都有预测目标，而非仅依赖稀疏奖励
表征结构：迫使潜空间捕捉因果/时序关系
TD 稳定性：更好的表征减少 TD 方差
跨任务共享：动力学预测是任务无关的，促进迁移

关键实验证据

predictive-representations-scalable-mtrl 的核心发现：

PPO 无预测表征 → 模型 scaling 无收益
PPO + 预测表征 → 持续随规模提升
MR.Q（预测表征 + model-free TD）超越 Newt（world model + 规划）

参考