20260617:目前有914 页
This commit is contained in:
28
concepts/flex-attention.md
Normal file
28
concepts/flex-attention.md
Normal file
@@ -0,0 +1,28 @@
|
||||
---
|
||||
title: "FlexAttention"
|
||||
created: 2025-06-02
|
||||
updated: 2025-06-02
|
||||
type: concept
|
||||
tags: [attention, pytorch, placeholder]
|
||||
sources: []
|
||||
---
|
||||
|
||||
# FlexAttention
|
||||
|
||||
> PyTorch 的可编程注意力 API(Dong et al., 2024),允许传入自定义注意力掩码,区别于 FlashAttention-2 的固定掩码模式。
|
||||
|
||||
## 核心特性
|
||||
|
||||
- 支持自定义 attention mask(BlockMask)
|
||||
- 编程模型:用 PyTorch 代码描述注意力模式,自动编译为高效 kernel
|
||||
- 在 [[goru-one-pass-to-reason-2025|One-Pass to Reason]] 中被用作关键实现后端
|
||||
|
||||
## 与 FlashAttention 的关系
|
||||
|
||||
[[flash-attention|FlashAttention-2]] 速度快但不支持自定义掩码。FlexAttention 提供掩码灵活性,速度略慢(约 15–20%),但在需要自定义注意力模式的场景(如 [[block-sparse-attention|分块稀疏注意力]])中是不可替代的。
|
||||
|
||||
## 相关
|
||||
|
||||
- [[flash-attention]]
|
||||
- [[block-sparse-attention]]
|
||||
- [[goru-one-pass-to-reason-2025|One-Pass to Reason]]
|
||||
Reference in New Issue
Block a user