20260625:很多新内容

2026-06-25 14:08:47 +08:00
parent 91fac5b6fc
commit 6021dea160
375 changed files with 19263 additions and 251 deletions
--- a/concepts/content-based-reasoning.md
+++ b/concepts/content-based-reasoning.md
@@ -0,0 +1,66 @@
+---
+title: "Content-Based Reasoning"
+created: 2026-06-18
+updated: 2026-06-18
+type: concept
+tags: ["sequence-modeling", "ssm", "mamba", "attention"]
+sources: ["https://arxiv.org/abs/2312.00752"]
+---
+
+# Content-Based Reasoning
+
+## 定义
+
+Content-Based Reasoning（内容感知推理）是 Mamba 论文识别出的 LTI 序列模型的核心弱点：**模型能否根据输入 token 的实际内容（而非仅时间位置）来决定信息的传播与遗忘**。Transformer 的注意力天然具备此能力（每个 token 的注意力分布取决于 query-key 的内容交互），但 LTI SSM 完全缺失。
+
+## 为什么 LTI 缺失此能力
+
+LTI（线性时间不变）模型的参数对所有时间步固定：
+
+```
+h_t = A_bar * h_{t-1} + B_bar * x_t   （A_bar, B_bar 不随 x_t 变化）
+```
+
+无论输入是 "important" 还是 "noise"，状态更新规则**完全相同**。模型无法：
+- 选择性地记住关键 token
+- 根据内容忽略无关 token
+- 在上下文中看到模式后改变行为
+
+## Transformer 为什么有
+
+自注意力中的 Q-K 内积是**天然的内容感知**：
+
+```
+Attention(Q, K, V) = softmax(Q K^T) V
+```
+
+Q 和 K 都是输入的函数 → 注意力分布随内容变化 → 模型能根据 token 的语义决定"关注谁"。
+
+## Mamba 的解决方案
+
+Mamba 的选择机制（[[selective-state-space]]）以不同的路径实现内容感知：
+
+```
+B_t, C_t, Δ_t = f(x_t)   ← SSM 参数变为输入的函数
+```
+
+不是让 token 彼此交互（注意力），而是让每个 token 的**处理方式**随其内容改变——看到重要 token 就"打开门"（大 Δ），看到噪声就"关上门"（小 Δ）。
+
+## 诊断任务
+
+两个合成任务精确测试内容感知能力：
+- [[selective-copy]]：需要根据 token "颜色"决定是否记忆
+- [[induction-heads]]：需要根据前缀"内容"回忆后续
+
+LTI 模型在两个任务上均失败，Mamba 不仅解决，且能外推到 >1M tokens。
+
+## 相关概念
+
+- [[selective-state-space]] — Mamba 实现内容感知的机制
+- [[structured-state-space-models]] — LTI，缺少此能力
+- [[attention-mechanism]] — 另一种内容感知的实现路径
+- [[gu-mamba|Mamba 论文]]
+
+## 参考
+
+- [[gu-mamba|Mamba]] (Gu & Dao, 2024) Section 3.1