20260617:目前有914 页
This commit is contained in:
52
concepts/k-pass-training.md
Normal file
52
concepts/k-pass-training.md
Normal file
@@ -0,0 +1,52 @@
|
||||
---
|
||||
title: "K-Pass Training (K 遍训练)"
|
||||
created: 2025-06-02
|
||||
updated: 2025-06-02
|
||||
type: concept
|
||||
tags: [training-optimization, multi-turn-reasoning, efficiency]
|
||||
sources: ["[[goru-one-pass-to-reason-2025]]"]
|
||||
---
|
||||
|
||||
# K-Pass Training
|
||||
|
||||
> [[goru-one-pass-to-reason-2025|One-Pass to Reason]] 中提出的训练方案连续统,在完全节省内存(N-Pass)和完全节省时间(1-Pass)之间提供灵活的速度–内存权衡。
|
||||
|
||||
## 动机
|
||||
|
||||
[[one-pass-fine-tuning|1-Pass]] 和 N-Pass 是两个极端:
|
||||
|
||||
- N-Pass:每轮一次前向传播,最小内存,最慢速度
|
||||
- 1-Pass:整个对话一次前向传播,+33% 内存,最快速度
|
||||
|
||||
K-Pass 允许用户在这两个极端之间插值,按需选择内存/速度平衡点。
|
||||
|
||||
## 实现
|
||||
|
||||
1. **分块**:将 N 轮对话均分为 K 段,每段 ⌈N/K⌉ 轮
|
||||
2. **段内 1-Pass**:当前段内应用 token 复制 + 自定义掩码
|
||||
3. **段间顺序处理**:前段作为后段的固定上下文(不复制 token)
|
||||
4. **Loss 隔离**:只计算当前段内 ti 和 ri_out 的 loss
|
||||
|
||||
## 速度–内存权衡
|
||||
|
||||
| K | 语义 | 加速比 (8B) | 额外内存 |
|
||||
|---|------|-----------|---------|
|
||||
| 1 | 1-Pass(最快) | 1.54× | +34% |
|
||||
| 2 | 平衡点 | 1.37× | +21% |
|
||||
| 4 | — | 1.09× | +17% |
|
||||
| 6 | — | 0.88× | +14% |
|
||||
| N | N-Pass(最少内存) | 1.00× | 0% |
|
||||
|
||||
**关键发现**:K > 4 后收益递减——长序列的 token 复制开销开始超过少量合并带来的节省。
|
||||
|
||||
## 推荐策略
|
||||
|
||||
- **内存充裕**:K=1(1-Pass),最大化速度
|
||||
- **内存适中**:K=2,用 21% 内存换取 37% 加速
|
||||
- **内存紧张**:K=4 或直接用 N-Pass
|
||||
|
||||
## 相关
|
||||
|
||||
- [[one-pass-fine-tuning]]
|
||||
- [[token-duplication]]
|
||||
- [[goru-one-pass-to-reason-2025|One-Pass to Reason 论文]]
|
||||
Reference in New Issue
Block a user