20260625:很多新内容
This commit is contained in:
41
concepts/step-recurrence.md
Normal file
41
concepts/step-recurrence.md
Normal file
@@ -0,0 +1,41 @@
|
||||
---
|
||||
title: "步级循环 (Step Recurrence)"
|
||||
created: 2026-06-18
|
||||
updated: 2026-06-18
|
||||
type: concept
|
||||
tags: [transformers, recurrence, ssm, state-tracking]
|
||||
sources:
|
||||
- mozer-topological-trouble-transformers-2026
|
||||
---
|
||||
|
||||
# 步级循环 (Step Recurrence)
|
||||
|
||||
步级循环是[[recurrence-taxonomy|循环分类法]]中沿**输入步轴**的循环模式:层内激活从前一步流向下一步(Mozer et al., 2026)。
|
||||
|
||||
## 对应 Mozer et al. 图 7
|
||||
|
||||
激活在**同一层内**从 t-1 步到 t 步横向传播,不同于深度循环的垂直传播。
|
||||
|
||||
## 代表架构
|
||||
|
||||
| 架构 | 特点 |
|
||||
|------|------|
|
||||
| **线性注意力**(Katharopoulos et al., 2020) | 核化注意力,线性复杂度 |
|
||||
| **Mamba**(Gu & Dao, 2024) | 选择性状态空间模型,输入依赖门控 |
|
||||
| **RWKV-7**(Peng et al., 2025) | 线性注意力 + Delta 规则 |
|
||||
| **DeltaNet**(Schlag et al., 2021) | Delta 规则驱动的快速权重更新 |
|
||||
| **PaTH Attention**(Yang et al., 2025b) | 路径注意力 |
|
||||
| **Canon Layers**(Allen-Zhu, 2025) | 规范形式的层结构 |
|
||||
| **Test-Time Regression**(Sun et al., 2025) | 推理时回归更新 |
|
||||
|
||||
## 表达能力边界
|
||||
|
||||
Merrill et al. (2025) 证明:具有**线性更新**的 SSM 表达能力不超过标准 Transformer。但扩展到**负特征值**(Grazzi et al., 2025)后,DeltaNet 超越了标准 Transformer 的表达力。
|
||||
|
||||
## 参考
|
||||
|
||||
- [[depth-recurrence|深度循环]]
|
||||
- [[state-space-models|状态空间模型]]
|
||||
- [[enhanced-state-space-models|增强状态空间模型]]
|
||||
- [[sequential-dependency|顺序依赖]]
|
||||
- [[mozer-topological-trouble-transformers-2026]]
|
||||
Reference in New Issue
Block a user