20260617:目前有914 页

This commit is contained in:
2026-06-17 15:02:40 +08:00
parent e96b955fda
commit 91fac5b6fc
423 changed files with 20687 additions and 34 deletions

48
concepts/model-free-rl.md Normal file
View File

@@ -0,0 +1,48 @@
---
title: "Model-Free 强化学习 (Model-Free RL)"
created: 2026-06-10
updated: 2026-06-10
type: concept
tags: ["deep-rl", "reinforcement-learning"]
sources: ["[[predictive-representations-scalable-mtrl]]"]
---
# Model-Free 强化学习 (Model-Free RL)
**Model-Free RL** 直接学习策略或价值函数,不显式建模环境动力学。与之相对的是 model-based RL后者学习转移模型 T(s'|s,a) 和奖励模型 R(s,a)。
## 经典算法
| 类型 | 算法 |
|------|------|
| Value-based | DQN, Rainbow |
| Policy-based | PPO, TRPO |
| Actor-Critic | TD3, SAC |
## 优势
- **简单**:无需维护 world model
- **高效**:每步仅需一次前向传播
- **稳定**:无模型误差累积问题
## 传统局限
- **样本效率低**:无 model 辅助 → 需更多环境交互
- **表征质量差**:仅靠 TD 误差驱动 → 稀疏信号
- **Scaling 差**:模型增大无收益(甚至退化)
## 新范式Model-Free + 预测表征
[[predictive-representations-scalable-mtrl|Obando-Ceron et al. (2026)]] 证明:通过在 model-free agent如 [[mrq-algorithm|MR.Q]])中加入[[auxiliary-predictive-objectives|辅助预测目标]],可以同时获得:
- Model-free 的简单高效
- Model-based 的表征学习优势
- 无需规划的代价
这代表了 model-free 和 model-based 之间的**第三条路**。
## 参考
- [[predictive-representations-scalable-mtrl|Scalable Multitask Deep RL]]
- [[mrq-algorithm|MR.Q]]
- [[world-models-rl|World Models]]
- [[predictive-representation-learning|Predictive Representation Learning]]