20260625:很多新内容

2026-06-25 14:08:47 +08:00
parent 91fac5b6fc
commit 6021dea160
375 changed files with 19263 additions and 251 deletions
--- a/papers/latent-cot-supervision.md
+++ b/papers/latent-cot-supervision.md
@@ -0,0 +1,77 @@
+---
+title: "What Makes Effective Supervision in Latent Chain-of-Thought"
+created: 2026-06-25
+updated: 2026-06-25
+type: paper
+tags: [latent-cot, information-theory, mutual-information, reasoning, supervision, representation-learning]
+sources:
+  - https://arxiv.org/abs/2606.20075
+  - https://github.com/EIT-NLP/Supervision-in-Latent-CoT
+---
+
+# Latent CoT Supervision
+
+**Latent CoT Supervision** 是 ICML 2026 的工作（Chen et al.），从信息论角度系统分析了 Latent Chain-of-Thought 的有效监督机制。核心贡献在于识别 outcome supervision 的失败机理，并将过程监督分解为两个互补维度。
+
+## 核心发现
+
+### 1. Outcome Supervision 的双重崩溃
+
+仅使用最终答案损失训练 Latent CoT 失败于两个机制：
+
+| 机制 | 现象 | 后果 |
+|------|------|------|
+| **[[dual-collapse|梯度衰减]]** | 监督信号集中于 L1，L2...L6 梯度接近零 | 模型依赖浅层位置，深层不参与推理 |
+| **[[dual-collapse|表征漂移]]** | 潜状态在训练中偏离语义参考区 | 失去语义锚定，进入无结构区域 |
+
+两者的交互效应：梯度衰减导致深层潜状态未受充分训练 → 它们在参数空间中"漂移" → 最终 answer loss 通过捷径（shortcut）最小化，而非通过真正的多步推理。
+
+### 2. 过程监督的二维分解
+
+**[[trajectory-supervision|Trajectory Supervision]]**（轨迹监督）：
+- 逐步注入推理信号：阶段 k 训练时，前 k 步使用连续潜状态 L_{≤k}，后续使用显式 token
+- 目标：最大化局部互信息 I(L_{≤k}; S_{k+1})
+- 关键发现：仅 Trajectory Supervision（无 Space Supervision）已显著优于 Outcome-only
+
+**[[space-supervision|Space Supervision]]**（空间监督）：
+- **[[geometric-compression-latent|Geometric Compression (GC)]]**：MSE 对齐潜状态到静态嵌入 → **破坏性约束**，坍缩高维推理流形
+- **[[generative-reconstruction-latent|Generative Reconstruction (GR)]]**：辅助解码器从潜状态恢复文本 → **语义锚定**，保留信息容量
+
+GR 的信息论优势：最小化 H(S_t | L_t) → 最大化 I(L_t; S_t) 的变分下界。
+
+### 3. Unified Latent Probe (ULP)
+
+[[unified-latent-probe|ULP]] 是一个轻量解码器 q_φ(S_t | L_t)，冻结模型后训练在所有 baseline 的潜状态上。
+其重建损失 L_Info 提供了一个严格的信息度量：
+- L_Info 低 → 潜状态保留了可恢复的推理语义
+- L_Info 高 → 潜状态退化到高熵无结构区域
+
+### 4. Information-Performance Binding
+
+[[information-performance-binding]]：推理精度与 ULP 重建损失呈严格的**反比关系**。即推理能力被潜链中的互信息上界严格约束。
+
+实验中 PS-GR（Trajectory + Generative Reconstruction）达到最优前沿：最大化 I(L_t; S_t) 并保持 I(L_{≤k}; S_{k+1}) 的可预测性。
+
+## 方法论要点
+
+- **渐进式训练**（Progressive Training）：从完全显式 CoT 逐步过渡到完全 Latent CoT
+- **粒度（Granularity g）**：g 个 token 合并为一个潜向量。g=1（逐 token 潜向量）效果最优但计算昂贵
+- **优化器重置**：过渡到连续状态时重置优化器 → "探索冲击"（exploration shock）帮助逃离局部最优
+- **信息衰减**：自回归潜生成存在 position-wise 信息衰减，GR 通过可重建性约束周期性"重置"语义漂移
+
+## 局限
+
+- 模型规模仅限于 GPT-2，需在更大模型上验证
+- 依赖过程标注（ground-truth reasoning steps），限制可扩展性
+- MI 估计受限于变分探针容量，可能保守
+
+## 参考
+
+- [原始存档](raw/papers/latent-cot-supervision-2026.md)
+- [[dual-collapse]]
+- [[trajectory-supervision]]
+- [[space-supervision]]
+- [[unified-latent-probe]]
+- [[information-performance-binding]]
+- [[generative-reconstruction-latent]]
+- [[geometric-compression-latent]]