20260617:目前有914 页

2026-06-17 15:02:40 +08:00
parent e96b955fda
commit 91fac5b6fc
423 changed files with 20687 additions and 34 deletions
--- a/concepts/flex-attention.md
+++ b/concepts/flex-attention.md
@@ -0,0 +1,28 @@
+---
+title: "FlexAttention"
+created: 2025-06-02
+updated: 2025-06-02
+type: concept
+tags: [attention, pytorch, placeholder]
+sources: []
+---
+
+# FlexAttention
+
+> PyTorch 的可编程注意力 API（Dong et al., 2024），允许传入自定义注意力掩码，区别于 FlashAttention-2 的固定掩码模式。
+
+## 核心特性
+
+- 支持自定义 attention mask（BlockMask）
+- 编程模型：用 PyTorch 代码描述注意力模式，自动编译为高效 kernel
+- 在 [[goru-one-pass-to-reason-2025|One-Pass to Reason]] 中被用作关键实现后端
+
+## 与 FlashAttention 的关系
+
+[[flash-attention|FlashAttention-2]] 速度快但不支持自定义掩码。FlexAttention 提供掩码灵活性，速度略慢（约 15–20%），但在需要自定义注意力模式的场景（如 [[block-sparse-attention|分块稀疏注意力]]）中是不可替代的。
+
+## 相关
+
+- [[flash-attention]]
+- [[block-sparse-attention]]
+- [[goru-one-pass-to-reason-2025|One-Pass to Reason]]