20260625:很多新内容
This commit is contained in:
21
raw/papers/latent-cot-supervision-2026.md
Normal file
21
raw/papers/latent-cot-supervision-2026.md
Normal file
@@ -0,0 +1,21 @@
|
||||
# What Makes Effective Supervision in Latent Chain-of-Thought: An Information-Theoretic Analysis
|
||||
|
||||
- **arXiv**: 2606.20075v1
|
||||
- **Published**: 2026-06-18
|
||||
- **Authors**: Xinghao Chen, Chak Tou Leong, Wenjin Guo, Jian Wang, Wenjie Li, Xiaoyu Shen (Eastern Institute of Technology / Hong Kong Polytechnic University)
|
||||
- **Categories**: cs.LG, cs.CL
|
||||
- **Venue**: ICML 2026
|
||||
- **Code**: https://github.com/EIT-NLP/Supervision-in-Latent-CoT
|
||||
- **Source**: https://arxiv.org/abs/2606.20075
|
||||
|
||||
## Abstract
|
||||
|
||||
从信息论角度分析 Latent Chain-of-Thought 的有效监督机制。识别出 outcome supervision 的"双重崩溃"——梯度衰减和表示漂移。将过程监督分解为两个互补维度:Trajectory Supervision(注入密集逐步推理信号)和 Space Supervision(通过生成式重建保留潜空间的语义结构)。提出 Unified Latent Probe (ULP) 量化潜轨迹与显式推理步骤之间的互信息。实验揭示 Information-Performance Binding:推理精度严格受限于潜在链中保留的信息保真度。
|
||||
|
||||
## Key Contributions
|
||||
|
||||
1. 信息论分析框架:将 Latent CoT 监督形式化为互信息最大化问题
|
||||
2. 双重崩溃诊断:梯度衰减 + 表征漂移是 outcome supervision 失败的根本原因
|
||||
3. 过程监督的二维分解:Trajectory Supervision × Space Supervision
|
||||
4. ULP 探针:量化潜状态中的可恢复推理信息
|
||||
5. Information-Performance Binding:推理能力严格受限于信息保真度
|
||||
Reference in New Issue
Block a user