Files
myWiki/concepts/deep-rl-scaling.md

51 lines
1.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "扩展深度强化学习 (Scaling Deep RL)"
created: 2026-06-10
updated: 2026-06-10
type: concept
tags: ["deep-rl", "scaling-laws", "multitask-learning"]
sources: ["[[predictive-representations-scalable-mtrl]]"]
---
# 扩展深度强化学习 (Scaling Deep RL)
**Scaling Deep RL** 关注如何通过增加模型容量、数据量和任务多样性来持续提升RL性能——类似于语言和视觉领域的 scaling laws 研究。
## 核心挑战
与监督学习不同RL 的 scaling 面临独特障碍:
1. **非平稳数据**:策略更新 → 数据分布变化
2. **Bootstrapping**TD 目标的递归性质放大误差
3. **表征崩溃**:大模型在稀疏信号下出现死神经元
4. **损失可塑性**:持续训练导致网络失去学习能力
## 关键发现
[[predictive-representations-scalable-mtrl|Obando-Ceron et al. (2026)]] 的核心 scaling 发现:
- **无预测表征**:模型增大 → 性能持平或退化
- **有预测表征**:模型增大 → 持续性能提升
**表征质量是 scaling 的瓶颈**,而非模型容量本身。
## 与 LLM/Vision Scaling 的对比
| 维度 | LLM/Vision | Deep RL |
|------|-----------|---------|
| 数据 | 静态语料库 | 在线交互 |
| 监督 | 密集 | 稀疏/非平稳 |
| 目标 | 静态 | Bootstrapped |
| Scaling 瓶颈 | 数据量 | **表征质量** |
## 实践意义
1. 扩大模型前,先确保表征学习机制到位
2. [[predictive-representation-learning|预测目标]]是低成本、高回报的 scaling 杠杆
3. Wall-clock 效率应与样本效率并重
## 参考
- [[predictive-representations-scalable-mtrl|Scalable Multitask Deep RL]]
- [[predictive-representation-learning|Predictive Representation Learning]]
- [[multitask-rl|Multitask RL]]