20260625:很多新内容

2026-06-25 14:08:47 +08:00
parent 91fac5b6fc
commit 6021dea160
375 changed files with 19263 additions and 251 deletions
--- a/concepts/attention-mechanism.md
+++ b/concepts/attention-mechanism.md
@@ -0,0 +1,49 @@
+---
+title: "Attention Mechanism"
+created: 2026-06-18
+updated: 2026-06-18
+type: concept
+tags: ["attention", "transformer", "sequence-modeling"]
+sources: ["https://arxiv.org/abs/2312.00752"]
+---
+
+# Attention Mechanism
+
+## 定义
+
+Attention Mechanism（注意力机制）是 Transformer 架构的核心模块（Vaswani et al., 2017），通过 query-key-value 交互实现序列中 token 之间的**内容感知信息路由**。每个 token 的注意力分布取决于其 query 与其他 token 的 key 之间的语义相似度。
+
+## 核心公式
+
+```
+Attention(Q, K, V) = softmax(Q K^T / sqrt(d_k)) V
+```
+
+## 与 Mamba 的对比
+
+Mamba 论文将注意力作为**内容感知推理**的参考标准：
+
+| 维度 | Attention | Mamba (S6) |
+|------|----------|-----------|
+| 内容感知 | ✅（Q-K 内积天然内容依赖） | ✅（B, C, Δ 为输入的函数） |
+| 复杂度 | O(n²) | O(n) |
+| 机制 | token 间显式交互 | token 独立处理后选择性记忆 |
+| 推理内存 | O(n) KV cache | O(1) 隐状态 |
+
+## 核心性质
+
+- **密集路由**：每个 token 与所有前序 token 交互 → O(n²)
+- **KV Cache**：自回归推理需缓存所有历史 (k, v)
+- **理论上无界上下文**：实际受内存限制
+
+## 相关概念
+
+- [[content-based-reasoning]] — 注意力天然具备的能力
+- [[kv-cache]] — 注意力的推理内存瓶颈
+- [[selective-state-space|selection mechanism]] — Mamba 的替代路径
+- [[gu-mamba|Mamba 论文]]
+
+## 参考
+
+- Vaswani et al. (2017) "Attention Is All You Need"
+- [[gu-mamba|Mamba]] (Gu & Dao, 2024)