20260625:很多新内容

2026-06-25 14:08:47 +08:00
parent 91fac5b6fc
commit 6021dea160
375 changed files with 19263 additions and 251 deletions
--- a/concepts/ssd-algorithm.md
+++ b/concepts/ssd-algorithm.md
@@ -0,0 +1,53 @@
+---
+title: "SSD 算法 (Structured State Space Duality Algorithm)"
+created: 2026-06-18
+updated: 2026-06-18
+type: concept
+tags: [algorithm, ssm, matrix-multiplication, gpu]
+sources:
+  - dao-transformers-are-ssms-2024
+---
+
+# SSD 算法 (SSD Algorithm)
+
+SSD 算法是 Dao & Gu (2024) 提出的**混合矩阵乘法算法**，利用 [[semiseparable-matrices|半可分矩阵]] 的块分解，在现代 GPU 上实现最优效率权衡。
+
+## 核心思路
+
+[[structured-state-space-duality|SSD 框架]] 揭示了 SSM 的两种等价计算方式：
+1. **循环形式**：O(T) 时间，但依赖串行扫描，无法利用 GPU Tensor Core
+2. **对偶（矩阵）形式**：O(T²) 时间，但可用高效矩阵乘法
+
+SSD 算法不走极端——**在块级别做分解**：
+
+```
+将矩阵 M 分解为 B × B 的块
+  块内：使用矩阵乘法（GPU 高效）
+  块间：使用循环传播（保持线性复杂度）
+```
+
+## 效率对比
+
+| 算法 | 训练 | 推理 | GPU 利用 |
+|------|:--:|:--:|:--:|
+| Mamba Selective Scan | 串行 | O(1) 状态 | 低（不用 Tensor Core） |
+| Flashattention | O(T²) | O(T) KV cache | 高 |
+| **SSD Algorithm** | **混合** | **O(1) 状态** | **高** |
+
+## 与 FlashAttention 的交叉点
+
+- 序列长度 2K：SSD 与 FlashAttention-2 **持平**
+- 序列长度 16K：SSD 比 FlashAttention-2 快 **6x**
+- 支持 **8x Mamba 的状态大小**，几乎无额外代价
+
+## 变长序列支持
+
+通过**传递循环状态**实现变长序列训练——无需 padding tokens——这对 SSM 是独特优势（Transformer 需要复杂的 padding 移除技术）。
+
+## 参考
+
+- [[structured-state-space-duality|SSD]]
+- [[semiseparable-matrices|半可分矩阵]]
+- [[mamba-2|Mamba-2]]
+- [[flash-attention|FlashAttention]]
+- [[dao-transformers-are-ssms-2024|论文]]