Files
myWiki/raw/papers/latent-cot-supervision-2026.md

22 lines
1.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# What Makes Effective Supervision in Latent Chain-of-Thought: An Information-Theoretic Analysis
- **arXiv**: 2606.20075v1
- **Published**: 2026-06-18
- **Authors**: Xinghao Chen, Chak Tou Leong, Wenjin Guo, Jian Wang, Wenjie Li, Xiaoyu Shen (Eastern Institute of Technology / Hong Kong Polytechnic University)
- **Categories**: cs.LG, cs.CL
- **Venue**: ICML 2026
- **Code**: https://github.com/EIT-NLP/Supervision-in-Latent-CoT
- **Source**: https://arxiv.org/abs/2606.20075
## Abstract
从信息论角度分析 Latent Chain-of-Thought 的有效监督机制。识别出 outcome supervision 的"双重崩溃"——梯度衰减和表示漂移。将过程监督分解为两个互补维度Trajectory Supervision注入密集逐步推理信号和 Space Supervision通过生成式重建保留潜空间的语义结构。提出 Unified Latent Probe (ULP) 量化潜轨迹与显式推理步骤之间的互信息。实验揭示 Information-Performance Binding推理精度严格受限于潜在链中保留的信息保真度。
## Key Contributions
1. 信息论分析框架:将 Latent CoT 监督形式化为互信息最大化问题
2. 双重崩溃诊断:梯度衰减 + 表征漂移是 outcome supervision 失败的根本原因
3. 过程监督的二维分解Trajectory Supervision × Space Supervision
4. ULP 探针:量化潜状态中的可恢复推理信息
5. Information-Performance Binding推理能力严格受限于信息保真度