20260617:目前有914 页

2026-06-17 15:02:40 +08:00
parent e96b955fda
commit 91fac5b6fc
423 changed files with 20687 additions and 34 deletions
--- a/concepts/model-free-rl.md
+++ b/concepts/model-free-rl.md
@@ -0,0 +1,48 @@
+---
+title: "Model-Free 强化学习 (Model-Free RL)"
+created: 2026-06-10
+updated: 2026-06-10
+type: concept
+tags: ["deep-rl", "reinforcement-learning"]
+sources: ["[[predictive-representations-scalable-mtrl]]"]
+---
+
+# Model-Free 强化学习 (Model-Free RL)
+
+**Model-Free RL** 直接学习策略或价值函数，不显式建模环境动力学。与之相对的是 model-based RL，后者学习转移模型 T(s'|s,a) 和奖励模型 R(s,a)。
+
+## 经典算法
+
+| 类型 | 算法 |
+|------|------|
+| Value-based | DQN, Rainbow |
+| Policy-based | PPO, TRPO |
+| Actor-Critic | TD3, SAC |
+
+## 优势
+
+- **简单**：无需维护 world model
+- **高效**：每步仅需一次前向传播
+- **稳定**：无模型误差累积问题
+
+## 传统局限
+
+- **样本效率低**：无 model 辅助 → 需更多环境交互
+- **表征质量差**：仅靠 TD 误差驱动 → 稀疏信号
+- **Scaling 差**：模型增大无收益（甚至退化）
+
+## 新范式：Model-Free + 预测表征
+
+[[predictive-representations-scalable-mtrl|Obando-Ceron et al. (2026)]] 证明：通过在 model-free agent（如 [[mrq-algorithm|MR.Q]]）中加入[[auxiliary-predictive-objectives|辅助预测目标]]，可以同时获得：
+
+- Model-free 的简单高效
+- Model-based 的表征学习优势
+- 无需规划的代价
+
+这代表了 model-free 和 model-based 之间的**第三条路**。
+
+## 参考
+- [[predictive-representations-scalable-mtrl|Scalable Multitask Deep RL]]
+- [[mrq-algorithm|MR.Q]]
+- [[world-models-rl|World Models]]
+- [[predictive-representation-learning|Predictive Representation Learning]]