20260617:目前有914 页

2026-06-17 15:02:40 +08:00
parent e96b955fda
commit 91fac5b6fc
423 changed files with 20687 additions and 34 deletions
--- a/concepts/representation-learning-rl.md
+++ b/concepts/representation-learning-rl.md
@@ -0,0 +1,47 @@
+---
+title: "RL中的表征学习 (Representation Learning in RL)"
+created: 2026-06-10
+updated: 2026-06-10
+type: concept
+tags: ["deep-rl", "representation-learning", "self-supervised-learning"]
+sources: ["[[predictive-representations-scalable-mtrl]]"]
+---
+
+# RL中的表征学习 (Representation Learning in RL)
+
+在深度RL中，**表征学习**关注如何学习对决策有用的状态/观测表示，而非仅依赖奖励信号。
+
+## 为什么奖励监督不够
+
+- **稀疏性**：奖励信号可能极稀疏（如围棋仅在终局）
+- **非平稳性**：策略更新 → 数据分布变化 → 旧表征失效
+- **TD 方差**：差的表征放大 bootstrapping 误差
+
+## 表征学习的信号来源
+
+### 1. 重构目标（Reconstruction）
+学习编码-解码：z_t ≈ decoder(encoder(s_t))
+
+### 2. 对比目标（Contrastive）
+正样本对 vs 负样本对：SimCLR 风格
+
+### 3. [[auxiliary-predictive-objectives|预测目标]]（Predictive）
+预测未来状态/奖励：z_{t+1}, r_t, d_t ← (z_t, a_t)
+
+预测目标是 [[predictive-representations-scalable-mtrl|Obando-Ceron et al. (2026)]] 的核心方法——已被证明在 scaling 行为中至关重要。
+
+## 表征质量的度量
+
+- **线性探测**：在冻结表征上训练线性分类器
+- **少样本微调**：在新任务上评估适应速度
+- **Neuronal 分析**：死神经元比例（表征崩溃的指标）
+
+## 在多任务RL中的特殊角色
+
+多任务设定加剧了表征需求：共享表征必须跨任务泛化。[[predictive-representation-learning|预测表征学习]]因其任务无关性（动力学预测不依赖特定奖励函数），天然适合多任务迁移。
+
+## 参考
+- [[predictive-representations-scalable-mtrl|Scalable Multitask Deep RL]]
+- [[predictive-representation-learning|Predictive Representation Learning]]
+- [[multitask-rl|Multitask RL]]
+- [[auxiliary-predictive-objectives|Auxiliary Predictive Objectives]]