Files
myWiki/concepts/model-free-rl.md

49 lines
1.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Model-Free 强化学习 (Model-Free RL)"
created: 2026-06-10
updated: 2026-06-10
type: concept
tags: ["deep-rl", "reinforcement-learning"]
sources: ["[[predictive-representations-scalable-mtrl]]"]
---
# Model-Free 强化学习 (Model-Free RL)
**Model-Free RL** 直接学习策略或价值函数,不显式建模环境动力学。与之相对的是 model-based RL后者学习转移模型 T(s'|s,a) 和奖励模型 R(s,a)。
## 经典算法
| 类型 | 算法 |
|------|------|
| Value-based | DQN, Rainbow |
| Policy-based | PPO, TRPO |
| Actor-Critic | TD3, SAC |
## 优势
- **简单**:无需维护 world model
- **高效**:每步仅需一次前向传播
- **稳定**:无模型误差累积问题
## 传统局限
- **样本效率低**:无 model 辅助 → 需更多环境交互
- **表征质量差**:仅靠 TD 误差驱动 → 稀疏信号
- **Scaling 差**:模型增大无收益(甚至退化)
## 新范式Model-Free + 预测表征
[[predictive-representations-scalable-mtrl|Obando-Ceron et al. (2026)]] 证明:通过在 model-free agent如 [[mrq-algorithm|MR.Q]])中加入[[auxiliary-predictive-objectives|辅助预测目标]],可以同时获得:
- Model-free 的简单高效
- Model-based 的表征学习优势
- 无需规划的代价
这代表了 model-free 和 model-based 之间的**第三条路**。
## 参考
- [[predictive-representations-scalable-mtrl|Scalable Multitask Deep RL]]
- [[mrq-algorithm|MR.Q]]
- [[world-models-rl|World Models]]
- [[predictive-representation-learning|Predictive Representation Learning]]