20260625:很多新内容
This commit is contained in:
41
concepts/sequential-dependency.md
Normal file
41
concepts/sequential-dependency.md
Normal file
@@ -0,0 +1,41 @@
|
||||
---
|
||||
title: "顺序依赖 (Sequential Dependency)"
|
||||
created: 2026-06-18
|
||||
updated: 2026-06-18
|
||||
type: concept
|
||||
tags: [transformers, recurrence, parallelization, training]
|
||||
sources:
|
||||
- mozer-topological-trouble-transformers-2026
|
||||
---
|
||||
|
||||
# 顺序依赖 (Sequential Dependency)
|
||||
|
||||
顺序依赖是指**状态追踪所必需的串行计算约束**——这种依赖性**排除了跨序列长度的完全并行化**(Mozer et al., 2026)。
|
||||
|
||||
## 本质
|
||||
|
||||
任意状态更新函数 `s_t = f(s_{t-1}, x_t)` 存在本质的顺序瓶颈:
|
||||
- s_t 的计算**必须等待** s_{t-1} 完成
|
||||
- 这是状态追踪的**定义性特征**,不是实现细节
|
||||
|
||||
## 在训练中的体现
|
||||
|
||||
- **Teacher Forcing 的问题**:标准 Transformer 训练时所有 token 并行处理——这是状态追踪能力差的根因
|
||||
- **自回归展开 (Autoregressive Unrolling)**:真正的循环需要即使在训练时也按步展开(Teoh et al., 2025b)
|
||||
- **Mozer et al. 的定义**:"循环步"严格定义为训练期间排除跨序列并行化的顺序依赖
|
||||
|
||||
## 对不同架构的影响
|
||||
|
||||
| 架构 | 训练并行化 | 状态追踪能力 |
|
||||
|------|----------|------------|
|
||||
| 纯前馈 Transformer | 完全并行 | 受深度限制 |
|
||||
| 深度循环 (Looped) | 完全并行 | 受深度限制 |
|
||||
| 线性 SSM | 可并行(关联扫描) | 不超过 Transformer |
|
||||
| 真循环架构 | 需部分串行 | 无界状态追踪 |
|
||||
|
||||
## 参考
|
||||
|
||||
- [[state-tracking|状态追踪]]
|
||||
- [[feedforward-depth-limitation|前馈深度局限]]
|
||||
- [[autoregressive-unrolling|自回归展开]]
|
||||
- [[mozer-topological-trouble-transformers-2026]]
|
||||
Reference in New Issue
Block a user