Files
myWiki/concepts/attention-mechanism.md

50 lines
1.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Attention Mechanism"
created: 2026-06-18
updated: 2026-06-18
type: concept
tags: ["attention", "transformer", "sequence-modeling"]
sources: ["https://arxiv.org/abs/2312.00752"]
---
# Attention Mechanism
## 定义
Attention Mechanism注意力机制是 Transformer 架构的核心模块Vaswani et al., 2017通过 query-key-value 交互实现序列中 token 之间的**内容感知信息路由**。每个 token 的注意力分布取决于其 query 与其他 token 的 key 之间的语义相似度。
## 核心公式
```
Attention(Q, K, V) = softmax(Q K^T / sqrt(d_k)) V
```
## 与 Mamba 的对比
Mamba 论文将注意力作为**内容感知推理**的参考标准:
| 维度 | Attention | Mamba (S6) |
|------|----------|-----------|
| 内容感知 | ✅Q-K 内积天然内容依赖) | ✅B, C, Δ 为输入的函数) |
| 复杂度 | O(n²) | O(n) |
| 机制 | token 间显式交互 | token 独立处理后选择性记忆 |
| 推理内存 | O(n) KV cache | O(1) 隐状态 |
## 核心性质
- **密集路由**:每个 token 与所有前序 token 交互 → O(n²)
- **KV Cache**:自回归推理需缓存所有历史 (k, v)
- **理论上无界上下文**:实际受内存限制
## 相关概念
- [[content-based-reasoning]] — 注意力天然具备的能力
- [[kv-cache]] — 注意力的推理内存瓶颈
- [[selective-state-space|selection mechanism]] — Mamba 的替代路径
- [[gu-mamba|Mamba 论文]]
## 参考
- Vaswani et al. (2017) "Attention Is All You Need"
- [[gu-mamba|Mamba]] (Gu & Dao, 2024)