Files
myWiki/concepts/sequential-dependency.md

42 lines
1.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "顺序依赖 (Sequential Dependency)"
created: 2026-06-18
updated: 2026-06-18
type: concept
tags: [transformers, recurrence, parallelization, training]
sources:
- mozer-topological-trouble-transformers-2026
---
# 顺序依赖 (Sequential Dependency)
顺序依赖是指**状态追踪所必需的串行计算约束**——这种依赖性**排除了跨序列长度的完全并行化**Mozer et al., 2026
## 本质
任意状态更新函数 `s_t = f(s_{t-1}, x_t)` 存在本质的顺序瓶颈:
- s_t 的计算**必须等待** s_{t-1} 完成
- 这是状态追踪的**定义性特征**,不是实现细节
## 在训练中的体现
- **Teacher Forcing 的问题**:标准 Transformer 训练时所有 token 并行处理——这是状态追踪能力差的根因
- **自回归展开 (Autoregressive Unrolling)**真正的循环需要即使在训练时也按步展开Teoh et al., 2025b
- **Mozer et al. 的定义**"循环步"严格定义为训练期间排除跨序列并行化的顺序依赖
## 对不同架构的影响
| 架构 | 训练并行化 | 状态追踪能力 |
|------|----------|------------|
| 纯前馈 Transformer | 完全并行 | 受深度限制 |
| 深度循环 (Looped) | 完全并行 | 受深度限制 |
| 线性 SSM | 可并行(关联扫描) | 不超过 Transformer |
| 真循环架构 | 需部分串行 | 无界状态追踪 |
## 参考
- [[state-tracking|状态追踪]]
- [[feedforward-depth-limitation|前馈深度局限]]
- [[autoregressive-unrolling|自回归展开]]
- [[mozer-topological-trouble-transformers-2026]]