20260617:目前有914 页
This commit is contained in:
48
concepts/multitask-rl.md
Normal file
48
concepts/multitask-rl.md
Normal file
@@ -0,0 +1,48 @@
|
||||
---
|
||||
title: "多任务强化学习 (Multitask RL)"
|
||||
created: 2026-06-10
|
||||
updated: 2026-06-10
|
||||
type: concept
|
||||
tags: ["deep-rl", "multitask-learning", "transfer-learning"]
|
||||
sources: ["[[predictive-representations-scalable-mtrl]]"]
|
||||
---
|
||||
|
||||
# 多任务强化学习 (Multitask RL, MTRL)
|
||||
|
||||
**多任务RL**的目标是训练单一 agent 在**任务分布** p(tau) 上最大化期望回报,而非针对单一任务。
|
||||
|
||||
## 形式化
|
||||
|
||||
每个任务 tau 定义 MDP M_tau = (S, A, T_tau, R_tau, gamma)。状态和动作空间通常共享,动力学和奖励按任务变化。
|
||||
|
||||
目标:
|
||||
```
|
||||
E_{tau~p(tau), pi} [ sum_t gamma^t * r_t ]
|
||||
```
|
||||
|
||||
## 核心挑战
|
||||
|
||||
1. **任务干扰(Task Interference)**:共享表征必须支持多个可能冲突的目标——不同任务的梯度可能不对齐
|
||||
|
||||
2. **非平稳性(Non-stationarity)**:数据分布随任务混合变化——违背 i.i.d. 假设
|
||||
|
||||
3. **容量利用不足(Underutilization)**:大模型在多任务设定中往往不能有效利用增加的参数
|
||||
|
||||
4. **表征瓶颈(Representation Bottleneck)**:[[predictive-representations-scalable-mtrl|Obando-Ceron et al.]] 认为表征质量是核心瓶颈
|
||||
|
||||
## 方法谱系
|
||||
|
||||
| 方法 | 代表 | 核心机制 |
|
||||
|------|------|---------|
|
||||
| Model-Based | Newt, Dreamer | 共享 world model + 规划 |
|
||||
| Model-Free + 预测表征 | [[mrq-algorithm|MR.Q]] | 辅助预测目标塑造表征 |
|
||||
| 纯 Model-Free | PPO, SAC | 仅奖励监督 |
|
||||
|
||||
## Scaling 行为
|
||||
|
||||
关键发现:纯 model-free 方法随模型规模增大无收益甚至退化;加入[[predictive-representation-learning|预测表征学习]]后持续改善——表征质量是 scaling 的瓶颈。
|
||||
|
||||
## 参考
|
||||
- [[predictive-representations-scalable-mtrl|Scalable Multitask Deep RL]]
|
||||
- [[predictive-representation-learning|Predictive Representation Learning]]
|
||||
- [[deep-rl-scaling|Scaling Deep RL]]
|
||||
Reference in New Issue
Block a user