20260625:很多新内容

2026-06-25 14:08:47 +08:00
parent 91fac5b6fc
commit 6021dea160
375 changed files with 19263 additions and 251 deletions
--- a/concepts/induction-heads.md
+++ b/concepts/induction-heads.md
@@ -0,0 +1,49 @@
+---
+title: "Induction Heads"
+created: 2026-06-18
+updated: 2026-06-18
+type: concept
+tags: ["llm-mechanism", "in-context-learning", "synthetic-task"]
+sources: ["https://arxiv.org/abs/2312.00752"]
+---
+
+# Induction Heads
+
+## 定义
+
+Induction Heads（归纳头）是 Olsson et al. (2022) 提出的注意力机制模式，被认为是解释 LLM **in-context learning** 能力的关键机制。Mamba 论文将其作为证明选择性 SSM 能力的第二个核心合成任务。
+
+## 机制描述
+
+Induction Head 执行一种**基于上下文的联想回忆**：
+
+```
+序列: ... [A] [B] ... [A] → 
+模型需要在看到第二个 [A] 时，"回忆起"第一个 [A] 后面是 [B]，并预测 [B]
+```
+
+本质上是一个"此前发生过什么"的模式匹配：`[prefix] ... [prefix] → [completion]`。
+
+## 为什么重要
+
+Olsson et al. 发现 Induction Heads 在 Transformer 训练过程中**阶段性涌现**（phase change），并且其出现与 in-context learning 能力的形成高度相关。Transformer 的注意力机制天然支持这种"前缀匹配 + 复制"操作。
+
+## 在 Mamba 中的作用
+
+Mamba 论文将 Induction Heads 作为第二个核心合成基准：
+
+- LTI SSM（S4、H3、Hyena）在此任务上表现受限——其时间不变的参数无法实现"根据前缀内容决定输出"的选择性行为
+- Mamba 的 S6 机制（[[selective-state-space]]）通过输入依赖的参数化，赋予了模型"看到什么内容就做什么决定"的能力
+- Mamba 不仅解决了 Induction Heads，还能**外推到 >1M token** 的序列
+
+## 相关概念
+
+- [[selective-copy]] — 另一个诊断合成任务
+- [[content-based-reasoning]] — Induction Heads 需要的能力
+- [[selective-state-space]] — Mamba 解决此任务的关键
+- [[in-context-learning]] — Induction Heads 解释的现象
+
+## 参考
+
+- Olsson et al. (2022) "In-context Learning and Induction Heads"
+- [[gu-mamba|Mamba]] (Gu & Dao, 2024) Section 3.1