20260625:很多新内容
This commit is contained in:
49
concepts/attention-mechanism.md
Normal file
49
concepts/attention-mechanism.md
Normal file
@@ -0,0 +1,49 @@
|
||||
---
|
||||
title: "Attention Mechanism"
|
||||
created: 2026-06-18
|
||||
updated: 2026-06-18
|
||||
type: concept
|
||||
tags: ["attention", "transformer", "sequence-modeling"]
|
||||
sources: ["https://arxiv.org/abs/2312.00752"]
|
||||
---
|
||||
|
||||
# Attention Mechanism
|
||||
|
||||
## 定义
|
||||
|
||||
Attention Mechanism(注意力机制)是 Transformer 架构的核心模块(Vaswani et al., 2017),通过 query-key-value 交互实现序列中 token 之间的**内容感知信息路由**。每个 token 的注意力分布取决于其 query 与其他 token 的 key 之间的语义相似度。
|
||||
|
||||
## 核心公式
|
||||
|
||||
```
|
||||
Attention(Q, K, V) = softmax(Q K^T / sqrt(d_k)) V
|
||||
```
|
||||
|
||||
## 与 Mamba 的对比
|
||||
|
||||
Mamba 论文将注意力作为**内容感知推理**的参考标准:
|
||||
|
||||
| 维度 | Attention | Mamba (S6) |
|
||||
|------|----------|-----------|
|
||||
| 内容感知 | ✅(Q-K 内积天然内容依赖) | ✅(B, C, Δ 为输入的函数) |
|
||||
| 复杂度 | O(n²) | O(n) |
|
||||
| 机制 | token 间显式交互 | token 独立处理后选择性记忆 |
|
||||
| 推理内存 | O(n) KV cache | O(1) 隐状态 |
|
||||
|
||||
## 核心性质
|
||||
|
||||
- **密集路由**:每个 token 与所有前序 token 交互 → O(n²)
|
||||
- **KV Cache**:自回归推理需缓存所有历史 (k, v)
|
||||
- **理论上无界上下文**:实际受内存限制
|
||||
|
||||
## 相关概念
|
||||
|
||||
- [[content-based-reasoning]] — 注意力天然具备的能力
|
||||
- [[kv-cache]] — 注意力的推理内存瓶颈
|
||||
- [[selective-state-space|selection mechanism]] — Mamba 的替代路径
|
||||
- [[gu-mamba|Mamba 论文]]
|
||||
|
||||
## 参考
|
||||
|
||||
- Vaswani et al. (2017) "Attention Is All You Need"
|
||||
- [[gu-mamba|Mamba]] (Gu & Dao, 2024)
|
||||
Reference in New Issue
Block a user