49 lines
1.5 KiB
Markdown
49 lines
1.5 KiB
Markdown
---
|
||
title: "Model-Free 强化学习 (Model-Free RL)"
|
||
created: 2026-06-10
|
||
updated: 2026-06-10
|
||
type: concept
|
||
tags: ["deep-rl", "reinforcement-learning"]
|
||
sources: ["[[predictive-representations-scalable-mtrl]]"]
|
||
---
|
||
|
||
# Model-Free 强化学习 (Model-Free RL)
|
||
|
||
**Model-Free RL** 直接学习策略或价值函数,不显式建模环境动力学。与之相对的是 model-based RL,后者学习转移模型 T(s'|s,a) 和奖励模型 R(s,a)。
|
||
|
||
## 经典算法
|
||
|
||
| 类型 | 算法 |
|
||
|------|------|
|
||
| Value-based | DQN, Rainbow |
|
||
| Policy-based | PPO, TRPO |
|
||
| Actor-Critic | TD3, SAC |
|
||
|
||
## 优势
|
||
|
||
- **简单**:无需维护 world model
|
||
- **高效**:每步仅需一次前向传播
|
||
- **稳定**:无模型误差累积问题
|
||
|
||
## 传统局限
|
||
|
||
- **样本效率低**:无 model 辅助 → 需更多环境交互
|
||
- **表征质量差**:仅靠 TD 误差驱动 → 稀疏信号
|
||
- **Scaling 差**:模型增大无收益(甚至退化)
|
||
|
||
## 新范式:Model-Free + 预测表征
|
||
|
||
[[predictive-representations-scalable-mtrl|Obando-Ceron et al. (2026)]] 证明:通过在 model-free agent(如 [[mrq-algorithm|MR.Q]])中加入[[auxiliary-predictive-objectives|辅助预测目标]],可以同时获得:
|
||
|
||
- Model-free 的简单高效
|
||
- Model-based 的表征学习优势
|
||
- 无需规划的代价
|
||
|
||
这代表了 model-free 和 model-based 之间的**第三条路**。
|
||
|
||
## 参考
|
||
- [[predictive-representations-scalable-mtrl|Scalable Multitask Deep RL]]
|
||
- [[mrq-algorithm|MR.Q]]
|
||
- [[world-models-rl|World Models]]
|
||
- [[predictive-representation-learning|Predictive Representation Learning]]
|