20260617:目前有914 页

2026-06-17 15:02:40 +08:00
parent e96b955fda
commit 91fac5b6fc
423 changed files with 20687 additions and 34 deletions
--- a/concepts/sequence-packing.md
+++ b/concepts/sequence-packing.md
@@ -0,0 +1,30 @@
+---
+title: "Sequence Packing (序列打包)"
+created: 2025-06-02
+updated: 2025-06-02
+type: concept
+tags: [training-optimization, efficiency, placeholder]
+sources: []
+---
+
+# Sequence Packing
+
+> 将多个短序列拼接为一个长序列以提升 GPU 利用率的训练技术（Krell et al., 2022），需通过位置 ID 防止跨样本注意力污染。
+
+## 核心思想
+
+在监督微调中，batch 内的序列长度通常不均匀。序列打包将多个短序列拼接在一起，使 GPU 处理的 token 数最大化。
+
+## 实现要点
+
+1. **无污染保证**：通过设置不同的 position ID 区间来防止不同序列之间的注意力泄漏
+2. **掩码叠加**：打包掩码（防跨样本污染）可与自定义注意力掩码通过逻辑 AND 结合
+
+## 在 One-Pass to Reason 中的应用
+
+[[goru-one-pass-to-reason-2025]] 中的 Flex-Pack 配置将序列打包与 [[block-sparse-attention]] 叠加，实现最佳加速效果。
+
+## 相关
+
+- [[goru-one-pass-to-reason-2025|One-Pass to Reason]]
+- [[block-sparse-attention]]