20260617:目前有914 页
This commit is contained in:
48
concepts/model-free-rl.md
Normal file
48
concepts/model-free-rl.md
Normal file
@@ -0,0 +1,48 @@
|
||||
---
|
||||
title: "Model-Free 强化学习 (Model-Free RL)"
|
||||
created: 2026-06-10
|
||||
updated: 2026-06-10
|
||||
type: concept
|
||||
tags: ["deep-rl", "reinforcement-learning"]
|
||||
sources: ["[[predictive-representations-scalable-mtrl]]"]
|
||||
---
|
||||
|
||||
# Model-Free 强化学习 (Model-Free RL)
|
||||
|
||||
**Model-Free RL** 直接学习策略或价值函数,不显式建模环境动力学。与之相对的是 model-based RL,后者学习转移模型 T(s'|s,a) 和奖励模型 R(s,a)。
|
||||
|
||||
## 经典算法
|
||||
|
||||
| 类型 | 算法 |
|
||||
|------|------|
|
||||
| Value-based | DQN, Rainbow |
|
||||
| Policy-based | PPO, TRPO |
|
||||
| Actor-Critic | TD3, SAC |
|
||||
|
||||
## 优势
|
||||
|
||||
- **简单**:无需维护 world model
|
||||
- **高效**:每步仅需一次前向传播
|
||||
- **稳定**:无模型误差累积问题
|
||||
|
||||
## 传统局限
|
||||
|
||||
- **样本效率低**:无 model 辅助 → 需更多环境交互
|
||||
- **表征质量差**:仅靠 TD 误差驱动 → 稀疏信号
|
||||
- **Scaling 差**:模型增大无收益(甚至退化)
|
||||
|
||||
## 新范式:Model-Free + 预测表征
|
||||
|
||||
[[predictive-representations-scalable-mtrl|Obando-Ceron et al. (2026)]] 证明:通过在 model-free agent(如 [[mrq-algorithm|MR.Q]])中加入[[auxiliary-predictive-objectives|辅助预测目标]],可以同时获得:
|
||||
|
||||
- Model-free 的简单高效
|
||||
- Model-based 的表征学习优势
|
||||
- 无需规划的代价
|
||||
|
||||
这代表了 model-free 和 model-based 之间的**第三条路**。
|
||||
|
||||
## 参考
|
||||
- [[predictive-representations-scalable-mtrl|Scalable Multitask Deep RL]]
|
||||
- [[mrq-algorithm|MR.Q]]
|
||||
- [[world-models-rl|World Models]]
|
||||
- [[predictive-representation-learning|Predictive Representation Learning]]
|
||||
Reference in New Issue
Block a user