Files
myWiki/concepts/world-models-rl.md

45 lines
1.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "World Models in RL"
created: 2026-06-10
updated: 2026-06-10
type: concept
tags: ["model-based-rl", "deep-rl", "world-models", "planning"]
sources: ["[[predictive-representations-scalable-mtrl]]"]
---
# World Models in RL
**World Models** 是 model-based RL 中学习环境动力学模型的范式agent 在潜空间中学习转移函数,并利用该模型进行规划或模拟。
## 代表性方法
| 方法 | 核心 |
|------|------|
| Dreamer (Hafner et al.) | RSSM + 潜空间想象 |
| TD-MPC2 | 时差学习 + MPC 规划 |
| Newt (Hansen et al., 2026) | 大规模多任务 world model |
## 优势
1. **密集监督**:预测未来状态提供丰富的学习信号
2. **样本效率**:潜空间 rollout 减少环境交互需求
3. **规划能力**:可以进行 lookahead 决策
## 代价
1. **计算开销**:潜空间 rollout 和规划增加 wall-clock 时间
2. **模型误差累积**rollout 越长,预测越不准确
3. **超参数敏感性**:规划 horizon、rollout 次数等
4. **实现复杂度**:需要维护 world model + policy + value
## 核心争议
[[predictive-representations-scalable-mtrl|Obando-Ceron et al. (2026)]] 提出world model 的好处**主要来自预测表征学习**而非规划本身。MR.Q无规划仅预测表征在效率和性能上均超越 Newtworld model + 规划)。
这暗示当前的 model-based RL 方法可能是"杀鸡用牛刀"——规划是不必要的计算负担。
## 参考
- [[predictive-representations-scalable-mtrl|Scalable Multitask Deep RL]]
- [[model-free-rl|Model-Free RL]]
- [[predictive-representation-learning|Predictive Representation Learning]]