20260617:目前有914 页
This commit is contained in:
50
concepts/deep-rl-scaling.md
Normal file
50
concepts/deep-rl-scaling.md
Normal file
@@ -0,0 +1,50 @@
|
||||
---
|
||||
title: "扩展深度强化学习 (Scaling Deep RL)"
|
||||
created: 2026-06-10
|
||||
updated: 2026-06-10
|
||||
type: concept
|
||||
tags: ["deep-rl", "scaling-laws", "multitask-learning"]
|
||||
sources: ["[[predictive-representations-scalable-mtrl]]"]
|
||||
---
|
||||
|
||||
# 扩展深度强化学习 (Scaling Deep RL)
|
||||
|
||||
**Scaling Deep RL** 关注如何通过增加模型容量、数据量和任务多样性来持续提升RL性能——类似于语言和视觉领域的 scaling laws 研究。
|
||||
|
||||
## 核心挑战
|
||||
|
||||
与监督学习不同,RL 的 scaling 面临独特障碍:
|
||||
|
||||
1. **非平稳数据**:策略更新 → 数据分布变化
|
||||
2. **Bootstrapping**:TD 目标的递归性质放大误差
|
||||
3. **表征崩溃**:大模型在稀疏信号下出现死神经元
|
||||
4. **损失可塑性**:持续训练导致网络失去学习能力
|
||||
|
||||
## 关键发现
|
||||
|
||||
[[predictive-representations-scalable-mtrl|Obando-Ceron et al. (2026)]] 的核心 scaling 发现:
|
||||
|
||||
- **无预测表征**:模型增大 → 性能持平或退化
|
||||
- **有预测表征**:模型增大 → 持续性能提升
|
||||
|
||||
→ **表征质量是 scaling 的瓶颈**,而非模型容量本身。
|
||||
|
||||
## 与 LLM/Vision Scaling 的对比
|
||||
|
||||
| 维度 | LLM/Vision | Deep RL |
|
||||
|------|-----------|---------|
|
||||
| 数据 | 静态语料库 | 在线交互 |
|
||||
| 监督 | 密集 | 稀疏/非平稳 |
|
||||
| 目标 | 静态 | Bootstrapped |
|
||||
| Scaling 瓶颈 | 数据量 | **表征质量** |
|
||||
|
||||
## 实践意义
|
||||
|
||||
1. 扩大模型前,先确保表征学习机制到位
|
||||
2. [[predictive-representation-learning|预测目标]]是低成本、高回报的 scaling 杠杆
|
||||
3. Wall-clock 效率应与样本效率并重
|
||||
|
||||
## 参考
|
||||
- [[predictive-representations-scalable-mtrl|Scalable Multitask Deep RL]]
|
||||
- [[predictive-representation-learning|Predictive Representation Learning]]
|
||||
- [[multitask-rl|Multitask RL]]
|
||||
Reference in New Issue
Block a user