20260601
This commit is contained in:
40
concepts/coarse-to-fine-granularity.md
Normal file
40
concepts/coarse-to-fine-granularity.md
Normal file
@@ -0,0 +1,40 @@
|
||||
---
|
||||
title: "Coarse-to-Fine Granularity"
|
||||
created: 2026-05-29
|
||||
updated: 2026-05-29
|
||||
type: concept
|
||||
tags: ["training-schedule", "efficiency", "multi-modal", "design-pattern"]
|
||||
sources: ["https://arxiv.org/abs/2605.06546"]
|
||||
---
|
||||
|
||||
# Coarse-to-Fine Granularity (粗→细粒度调度)
|
||||
|
||||
**Coarse-to-Fine Granularity** 是一种跨模态的训练效率设计模式:先用粗粒度、高吞吐量的表示进行训练,再逐步切换到细粒度表示。这是一个在视觉、语言和多模态中反复出现的可再生设计原则。
|
||||
|
||||
## 在语言模型中的体现
|
||||
|
||||
- **[[token-superposition-training|TST]]** (Peng et al. 2026): s-token 平均 → 标准 token
|
||||
- **SuperBPE** (Liu et al.): 合并 BPE token → supertoken
|
||||
- **Bolmo** (Minixhofer et al.): byte-level → subword
|
||||
- **Patch-Level Training** (Shao et al.): patch 平均 → 标准 token
|
||||
|
||||
## 在视觉模型中的体现
|
||||
|
||||
- **ViT patch size scheduling** (Anagnostidis et al.): 大 patch → 小 patch
|
||||
- 本质相同:patch size 控制视觉 ViT 的"输入粒度"
|
||||
|
||||
## 原理
|
||||
|
||||
粗粒度表示 = 每个训练样本携带更多"原始信息",但分辨率更低。这等价于:
|
||||
- 等计算量下吞吐量 ↑ s 倍
|
||||
- 先学习粗统计结构,后精调细节
|
||||
|
||||
## 效率公式
|
||||
|
||||
在 compute-bound 约束下(训练受限于 FLOPs 而非数据量),coarse-to-fine 调度本质上用**更多数据吞吐量**换取**更快的 loss 下降**——这与 [[throughput-hypothesis]] 一致。
|
||||
|
||||
## 相关
|
||||
|
||||
- [[token-superposition-training]] — 语言模型中的实例
|
||||
- [[throughput-hypothesis]] — 吞吐量假说
|
||||
- [[two-phase-pretraining]] — 实现粗→细调度的训练范式
|
||||
Reference in New Issue
Block a user