20260617:目前有914 页

2026-06-17 15:02:40 +08:00
parent e96b955fda
commit 91fac5b6fc
423 changed files with 20687 additions and 34 deletions
--- a/concepts/world-models-rl.md
+++ b/concepts/world-models-rl.md
@@ -0,0 +1,44 @@
+---
+title: "World Models in RL"
+created: 2026-06-10
+updated: 2026-06-10
+type: concept
+tags: ["model-based-rl", "deep-rl", "world-models", "planning"]
+sources: ["[[predictive-representations-scalable-mtrl]]"]
+---
+
+# World Models in RL
+
+**World Models** 是 model-based RL 中学习环境动力学模型的范式：agent 在潜空间中学习转移函数，并利用该模型进行规划或模拟。
+
+## 代表性方法
+
+| 方法 | 核心 |
+|------|------|
+| Dreamer (Hafner et al.) | RSSM + 潜空间想象 |
+| TD-MPC2 | 时差学习 + MPC 规划 |
+| Newt (Hansen et al., 2026) | 大规模多任务 world model |
+
+## 优势
+
+1. **密集监督**：预测未来状态提供丰富的学习信号
+2. **样本效率**：潜空间 rollout 减少环境交互需求
+3. **规划能力**：可以进行 lookahead 决策
+
+## 代价
+
+1. **计算开销**：潜空间 rollout 和规划增加 wall-clock 时间
+2. **模型误差累积**：rollout 越长，预测越不准确
+3. **超参数敏感性**：规划 horizon、rollout 次数等
+4. **实现复杂度**：需要维护 world model + policy + value
+
+## 核心争议
+
+[[predictive-representations-scalable-mtrl|Obando-Ceron et al. (2026)]] 提出：world model 的好处**主要来自预测表征学习**，而非规划本身。MR.Q（无规划，仅预测表征）在效率和性能上均超越 Newt（world model + 规划）。
+
+这暗示当前的 model-based RL 方法可能是"杀鸡用牛刀"——规划是不必要的计算负担。
+
+## 参考
+- [[predictive-representations-scalable-mtrl|Scalable Multitask Deep RL]]
+- [[model-free-rl|Model-Free RL]]
+- [[predictive-representation-learning|Predictive Representation Learning]]