20260625:很多新内容
This commit is contained in:
62
reviews/mozer-topological-trouble-review-20260618.md
Normal file
62
reviews/mozer-topological-trouble-review-20260618.md
Normal file
@@ -0,0 +1,62 @@
|
||||
---
|
||||
title: "Review: The Topological Trouble With Transformers"
|
||||
created: 2026-06-18
|
||||
updated: 2026-06-18
|
||||
type: review
|
||||
source: mozer-topological-trouble-transformers-2026
|
||||
---
|
||||
|
||||
# 📌 基本信息
|
||||
|
||||
- **论文标题**:The Topological Trouble With Transformers
|
||||
- **作者**:Michael C. Mozer, Shoaib Ahmed Siddiqui, Rosanne Liu (Google DeepMind)
|
||||
- **领域**:cs.LG, cs.AI
|
||||
- **arXiv ID**:2604.17121
|
||||
- **类型**:立场性综述 (Position Paper)
|
||||
- **添加时间**:2026-06-18
|
||||
|
||||
# 🎯 核心概念
|
||||
|
||||
1. **[[state-tracking|状态追踪]]** — 迭代更新反映变化环境的潜变量,是语言理解和推理的核心能力
|
||||
2. **[[feedforward-depth-limitation|前馈深度局限]]** — 前馈架构迫使状态表示逐层上移,最终耗尽模型深度
|
||||
3. **[[recurrence-taxonomy|循环分类法]]** — 两维度(循环轴 × 输入/循环步比例)系统化分类所有循环 Transformer 架构
|
||||
4. **[[depth-recurrence|深度循环]]** — 沿层深度轴的循环(Looped Transformer),增强表达力但状态仍上移
|
||||
5. **[[step-recurrence|步级循环]]** — 层内跨输入步的状态传播(Mamba, DeltaNet, RWKV-7)
|
||||
6. **[[enhanced-state-space-models|增强状态空间模型]]** — 超越标准 Transformer 表达力的 SSM(DeltaNet 负特征值扩展等)
|
||||
7. **[[latent-thought-models|隐式思考模型]]** — 多步自回归处理单个 token,不消耗上下文窗口
|
||||
8. **[[coarse-grained-recurrence|粗粒度循环]]** — 句子/块级别的循环,降低 token 级循环的计算负担
|
||||
|
||||
# 🔗 概念网络
|
||||
|
||||
## 核心连接
|
||||
```
|
||||
state-tracking ← feedforward-depth-limitation ← depth-dilemma
|
||||
↓
|
||||
recurrent-transformer-architectures ← recurrence-taxonomy
|
||||
↓ ↓
|
||||
depth-recurrence step-recurrence ← state-space-models
|
||||
↓ ↓
|
||||
representational-alignment enhanced-state-space-models
|
||||
↓
|
||||
attractor-dynamics ← latent-thought-models
|
||||
↓
|
||||
coarse-grained-recurrence → sequential-dependency → autoregressive-unrolling
|
||||
```
|
||||
|
||||
## 扩展网络
|
||||
- 连接了 16 个新增概念 + 复用 1 个已有概念(chain-of-thought)
|
||||
- 核心链接密度:平均每概念 4-6 个双向链接
|
||||
- 建立跨概念连接:深度↔步级、状态追踪↔信念状态、分类法↔架构成分
|
||||
|
||||
# 📚 Wiki 集成
|
||||
|
||||
- **新增页面**:17 个(1 论文 + 16 概念)
|
||||
- **复用页面**:1 个([[chain-of-thought|思维链]])
|
||||
- **链接完整性**:待验证
|
||||
- **总规模变化**:增量 +17 页
|
||||
|
||||
# 💡 关键洞察
|
||||
|
||||
1. **"拓扑性麻烦"的本质**:Transformer 的问题不是"做不到"状态追踪,而是前馈拓扑的**结构属性**——状态必须逐层上移,这并非偶然的工程缺陷,而是架构的必然结果。这个洞察比任何具体解决方案都更有价值。
|
||||
|
||||
2. **从外化到内化**:论文最锐利的论点是:CoT 作为"对自己说话"的机制,对于深层消歧(如 bank 的词义)这类人类自动完成的微认知而言是怪异的。真正的方向是**隐式激活动力学**而非显式思维轨迹——这从根本上挑战了当前"更多 thinking tokens = 更好推理"的范式。
|
||||
Reference in New Issue
Block a user