SidneyZhang/myWiki

Files

Sidney Zhang 91fac5b6fc

20260617:目前有914 页

2026-06-17 15:02:40 +08:00

1.9 KiB

Raw Blame History

title, created, updated, type, tags, sources

title

created

updated

type

tags

sources

RL中的表征学习 (Representation Learning in RL)

2026-06-10

2026-06-10

concept

deep-rl

representation-learning

self-supervised-learning

predictive-representations-scalable-mtrl

RL中的表征学习 (Representation Learning in RL)

在深度RL中，表征学习关注如何学习对决策有用的状态/观测表示，而非仅依赖奖励信号。

为什么奖励监督不够

稀疏性：奖励信号可能极稀疏（如围棋仅在终局）
非平稳性：策略更新 → 数据分布变化 → 旧表征失效
TD 方差：差的表征放大 bootstrapping 误差

表征学习的信号来源

1. 重构目标（Reconstruction）

学习编码-解码：z_t ≈ decoder(encoder(s_t))

2. 对比目标（Contrastive）

正样本对 vs 负样本对：SimCLR 风格

3. auxiliary-predictive-objectives（Predictive）

预测未来状态/奖励：z_{t+1}, r_t, d_t ← (z_t, a_t)

预测目标是 predictive-representations-scalable-mtrl 的核心方法——已被证明在 scaling 行为中至关重要。

表征质量的度量

线性探测：在冻结表征上训练线性分类器
少样本微调：在新任务上评估适应速度
Neuronal 分析：死神经元比例（表征崩溃的指标）

在多任务RL中的特殊角色

多任务设定加剧了表征需求：共享表征必须跨任务泛化。predictive-representation-learning因其任务无关性（动力学预测不依赖特定奖励函数），天然适合多任务迁移。

参考