Files
myWiki/concepts/representation-learning-rl.md

48 lines
1.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "RL中的表征学习 (Representation Learning in RL)"
created: 2026-06-10
updated: 2026-06-10
type: concept
tags: ["deep-rl", "representation-learning", "self-supervised-learning"]
sources: ["[[predictive-representations-scalable-mtrl]]"]
---
# RL中的表征学习 (Representation Learning in RL)
在深度RL中**表征学习**关注如何学习对决策有用的状态/观测表示,而非仅依赖奖励信号。
## 为什么奖励监督不够
- **稀疏性**:奖励信号可能极稀疏(如围棋仅在终局)
- **非平稳性**:策略更新 → 数据分布变化 → 旧表征失效
- **TD 方差**:差的表征放大 bootstrapping 误差
## 表征学习的信号来源
### 1. 重构目标Reconstruction
学习编码-解码z_t ≈ decoder(encoder(s_t))
### 2. 对比目标Contrastive
正样本对 vs 负样本对SimCLR 风格
### 3. [[auxiliary-predictive-objectives|预测目标]]Predictive
预测未来状态/奖励z_{t+1}, r_t, d_t ← (z_t, a_t)
预测目标是 [[predictive-representations-scalable-mtrl|Obando-Ceron et al. (2026)]] 的核心方法——已被证明在 scaling 行为中至关重要。
## 表征质量的度量
- **线性探测**:在冻结表征上训练线性分类器
- **少样本微调**:在新任务上评估适应速度
- **Neuronal 分析**:死神经元比例(表征崩溃的指标)
## 在多任务RL中的特殊角色
多任务设定加剧了表征需求:共享表征必须跨任务泛化。[[predictive-representation-learning|预测表征学习]]因其任务无关性(动力学预测不依赖特定奖励函数),天然适合多任务迁移。
## 参考
- [[predictive-representations-scalable-mtrl|Scalable Multitask Deep RL]]
- [[predictive-representation-learning|Predictive Representation Learning]]
- [[multitask-rl|Multitask RL]]
- [[auxiliary-predictive-objectives|Auxiliary Predictive Objectives]]