Files
myWiki/concepts/flex-attention.md

29 lines
976 B
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "FlexAttention"
created: 2025-06-02
updated: 2025-06-02
type: concept
tags: [attention, pytorch, placeholder]
sources: []
---
# FlexAttention
> PyTorch 的可编程注意力 APIDong et al., 2024允许传入自定义注意力掩码区别于 FlashAttention-2 的固定掩码模式。
## 核心特性
- 支持自定义 attention maskBlockMask
- 编程模型:用 PyTorch 代码描述注意力模式,自动编译为高效 kernel
- 在 [[goru-one-pass-to-reason-2025|One-Pass to Reason]] 中被用作关键实现后端
## 与 FlashAttention 的关系
[[flash-attention|FlashAttention-2]] 速度快但不支持自定义掩码。FlexAttention 提供掩码灵活性,速度略慢(约 1520%),但在需要自定义注意力模式的场景(如 [[block-sparse-attention|分块稀疏注意力]])中是不可替代的。
## 相关
- [[flash-attention]]
- [[block-sparse-attention]]
- [[goru-one-pass-to-reason-2025|One-Pass to Reason]]