20260625:很多新内容
This commit is contained in:
57
concepts/dual-collapse.md
Normal file
57
concepts/dual-collapse.md
Normal file
@@ -0,0 +1,57 @@
|
||||
---
|
||||
title: "Dual Collapse in Latent CoT"
|
||||
created: 2026-06-25
|
||||
updated: 2026-06-25
|
||||
type: concept
|
||||
tags: [latent-cot, optimization, gradient-flow, representation-drift, supervision]
|
||||
sources:
|
||||
- "[[latent-cot-supervision]]"
|
||||
---
|
||||
|
||||
# Dual Collapse in Latent CoT
|
||||
|
||||
**Dual Collapse**(双重崩溃)是 [[latent-cot-supervision|Latent CoT Supervision]] 论文中诊断的 Outcome Supervision 失败的根源机制,由两个耦合的退化过程组成。
|
||||
|
||||
## 组件一:梯度衰减 (Gradient Attenuation)
|
||||
|
||||
仅使用最终 answer loss 时,反向传播的梯度沿潜链衰减:
|
||||
|
||||
```
|
||||
G(t) = ||∂L_OS / ∂L_t||
|
||||
```
|
||||
|
||||
实证发现:G(1) >> G(2) > ... > G(6) ≈ 0。
|
||||
|
||||
**后果**:
|
||||
- 模型依赖 L1 承载几乎所有推理负担(structural shortcut)
|
||||
- 深层潜状态实际上处于"未训练"状态
|
||||
- 类似 gradient starvation (Pezeshki et al., 2021):主导浅层特征抑制深层依赖的学习
|
||||
|
||||
## 组件二:表征漂移 (Representational Drift / Manifold Drift)
|
||||
|
||||
由于深层潜状态缺乏有效梯度信号,它们的表征在训练过程中偏离显式 CoT 嵌入所定义的语义空间:
|
||||
|
||||
- PCA 可视化显示潜轨迹从语义参考区向外发散
|
||||
- 面积比达 460.3× —— 潜空间探索区域远大于语义有效区域
|
||||
- 失去语义锚定后,潜状态进入无结构高熵区域
|
||||
|
||||
## 交互效应
|
||||
|
||||
两个机制的耦合形成恶性循环:
|
||||
1. 梯度衰减 → 深层潜状态未受训练
|
||||
2. 未受训练的潜状态漂移 → 对 answer loss 贡献降级
|
||||
3. 贡献降级 → 分配更少梯度 → 进一步衰减
|
||||
|
||||
最终:模型通过捷径(shortcut)最小化损失,而非通过真正的多步推理。
|
||||
|
||||
## 解决方案
|
||||
|
||||
过程监督(Process Supervision)通过两个维度打断这个循环:
|
||||
- [[trajectory-supervision|Trajectory Supervision]]:在每个推理步骤注入局部梯度信号,打破梯度衰减
|
||||
- [[space-supervision|Space Supervision]]:通过生成式重建锚定潜状态,防止表征漂移
|
||||
|
||||
## 参考
|
||||
|
||||
- [[latent-cot-supervision]]
|
||||
- [[trajectory-supervision]]
|
||||
- [[space-supervision]]
|
||||
Reference in New Issue
Block a user