20260625:很多新内容
This commit is contained in:
40
concepts/depth-recurrence.md
Normal file
40
concepts/depth-recurrence.md
Normal file
@@ -0,0 +1,40 @@
|
||||
---
|
||||
title: "深度循环 (Depth Recurrence)"
|
||||
created: 2026-06-18
|
||||
updated: 2026-06-18
|
||||
type: concept
|
||||
tags: [transformers, recurrence, depth, inference-time-scaling]
|
||||
sources:
|
||||
- mozer-topological-trouble-transformers-2026
|
||||
---
|
||||
|
||||
# 深度循环 (Depth Recurrence)
|
||||
|
||||
深度循环是[[recurrence-taxonomy|循环分类法]]中沿**层深度轴**的循环模式:激活从深层回流浅层,形成循环 Transformer 块(Mozer et al., 2026)。
|
||||
|
||||
## 典型形式
|
||||
|
||||
对应 Mozer et al. 图 5b 的展开模式:
|
||||
|
||||
- **Looped Transformer**(Giannou et al., 2023; Dehghani et al., 2019):单个/多个层被重复执行
|
||||
- **RINS**(Alabdulmohsin & Zhai, 2025):自适应深度循环
|
||||
- **推理时扩展**(Inference-time scaling):Yang et al. (2024a), Chen et al. (2025b), Geiping et al. (2025) 等
|
||||
|
||||
## 关键局限
|
||||
|
||||
虽然深度循环增强了表达能力(Saunshi et al., 2025),但**不能实现无限状态追踪**:
|
||||
|
||||
> 因为 s(t+1) 必须位于比 s(t) 更高的层——无论循环多少深度,状态表示仍然在垂直方向上移。
|
||||
|
||||
## 应用场景
|
||||
|
||||
- **推理时计算扩展**(test-time compute scaling)
|
||||
- **微调适配**:预训练模型 + 深度循环微调(Koishekenov et al., 2025)
|
||||
- **零训练循环**:纯推理时方法提升推理(Li et al., 2025b; Chen et al., 2026)
|
||||
|
||||
## 参考
|
||||
|
||||
- [[step-recurrence|步级循环]]
|
||||
- [[recurrence-taxonomy|循环分类法]]
|
||||
- [[coarse-grained-recurrence|粗粒度循环]]
|
||||
- [[mozer-topological-trouble-transformers-2026]]
|
||||
Reference in New Issue
Block a user