20260617:目前有914 页
This commit is contained in:
44
concepts/world-models-rl.md
Normal file
44
concepts/world-models-rl.md
Normal file
@@ -0,0 +1,44 @@
|
||||
---
|
||||
title: "World Models in RL"
|
||||
created: 2026-06-10
|
||||
updated: 2026-06-10
|
||||
type: concept
|
||||
tags: ["model-based-rl", "deep-rl", "world-models", "planning"]
|
||||
sources: ["[[predictive-representations-scalable-mtrl]]"]
|
||||
---
|
||||
|
||||
# World Models in RL
|
||||
|
||||
**World Models** 是 model-based RL 中学习环境动力学模型的范式:agent 在潜空间中学习转移函数,并利用该模型进行规划或模拟。
|
||||
|
||||
## 代表性方法
|
||||
|
||||
| 方法 | 核心 |
|
||||
|------|------|
|
||||
| Dreamer (Hafner et al.) | RSSM + 潜空间想象 |
|
||||
| TD-MPC2 | 时差学习 + MPC 规划 |
|
||||
| Newt (Hansen et al., 2026) | 大规模多任务 world model |
|
||||
|
||||
## 优势
|
||||
|
||||
1. **密集监督**:预测未来状态提供丰富的学习信号
|
||||
2. **样本效率**:潜空间 rollout 减少环境交互需求
|
||||
3. **规划能力**:可以进行 lookahead 决策
|
||||
|
||||
## 代价
|
||||
|
||||
1. **计算开销**:潜空间 rollout 和规划增加 wall-clock 时间
|
||||
2. **模型误差累积**:rollout 越长,预测越不准确
|
||||
3. **超参数敏感性**:规划 horizon、rollout 次数等
|
||||
4. **实现复杂度**:需要维护 world model + policy + value
|
||||
|
||||
## 核心争议
|
||||
|
||||
[[predictive-representations-scalable-mtrl|Obando-Ceron et al. (2026)]] 提出:world model 的好处**主要来自预测表征学习**,而非规划本身。MR.Q(无规划,仅预测表征)在效率和性能上均超越 Newt(world model + 规划)。
|
||||
|
||||
这暗示当前的 model-based RL 方法可能是"杀鸡用牛刀"——规划是不必要的计算负担。
|
||||
|
||||
## 参考
|
||||
- [[predictive-representations-scalable-mtrl|Scalable Multitask Deep RL]]
|
||||
- [[model-free-rl|Model-Free RL]]
|
||||
- [[predictive-representation-learning|Predictive Representation Learning]]
|
||||
Reference in New Issue
Block a user