20260617:目前有914 页
This commit is contained in:
47
concepts/representation-learning-rl.md
Normal file
47
concepts/representation-learning-rl.md
Normal file
@@ -0,0 +1,47 @@
|
||||
---
|
||||
title: "RL中的表征学习 (Representation Learning in RL)"
|
||||
created: 2026-06-10
|
||||
updated: 2026-06-10
|
||||
type: concept
|
||||
tags: ["deep-rl", "representation-learning", "self-supervised-learning"]
|
||||
sources: ["[[predictive-representations-scalable-mtrl]]"]
|
||||
---
|
||||
|
||||
# RL中的表征学习 (Representation Learning in RL)
|
||||
|
||||
在深度RL中,**表征学习**关注如何学习对决策有用的状态/观测表示,而非仅依赖奖励信号。
|
||||
|
||||
## 为什么奖励监督不够
|
||||
|
||||
- **稀疏性**:奖励信号可能极稀疏(如围棋仅在终局)
|
||||
- **非平稳性**:策略更新 → 数据分布变化 → 旧表征失效
|
||||
- **TD 方差**:差的表征放大 bootstrapping 误差
|
||||
|
||||
## 表征学习的信号来源
|
||||
|
||||
### 1. 重构目标(Reconstruction)
|
||||
学习编码-解码:z_t ≈ decoder(encoder(s_t))
|
||||
|
||||
### 2. 对比目标(Contrastive)
|
||||
正样本对 vs 负样本对:SimCLR 风格
|
||||
|
||||
### 3. [[auxiliary-predictive-objectives|预测目标]](Predictive)
|
||||
预测未来状态/奖励:z_{t+1}, r_t, d_t ← (z_t, a_t)
|
||||
|
||||
预测目标是 [[predictive-representations-scalable-mtrl|Obando-Ceron et al. (2026)]] 的核心方法——已被证明在 scaling 行为中至关重要。
|
||||
|
||||
## 表征质量的度量
|
||||
|
||||
- **线性探测**:在冻结表征上训练线性分类器
|
||||
- **少样本微调**:在新任务上评估适应速度
|
||||
- **Neuronal 分析**:死神经元比例(表征崩溃的指标)
|
||||
|
||||
## 在多任务RL中的特殊角色
|
||||
|
||||
多任务设定加剧了表征需求:共享表征必须跨任务泛化。[[predictive-representation-learning|预测表征学习]]因其任务无关性(动力学预测不依赖特定奖励函数),天然适合多任务迁移。
|
||||
|
||||
## 参考
|
||||
- [[predictive-representations-scalable-mtrl|Scalable Multitask Deep RL]]
|
||||
- [[predictive-representation-learning|Predictive Representation Learning]]
|
||||
- [[multitask-rl|Multitask RL]]
|
||||
- [[auxiliary-predictive-objectives|Auxiliary Predictive Objectives]]
|
||||
Reference in New Issue
Block a user