20260625:很多新内容
This commit is contained in:
52
concepts/self-resampling.md
Normal file
52
concepts/self-resampling.md
Normal file
@@ -0,0 +1,52 @@
|
||||
---
|
||||
title: "Self-Resampling"
|
||||
created: 2026-06-20
|
||||
updated: 2026-06-20
|
||||
type: concept
|
||||
tags: ["training", "autoregressive", "streaming", "diffusion", "self-play"]
|
||||
sources: ["https://arxiv.org/abs/2606.17800"]
|
||||
---
|
||||
|
||||
# Self-Resampling (自重采样)
|
||||
|
||||
**Self-Resampling** 是 [[maineCoon|MaineCoon]] 提出的流式自回归训练技术:在训练时让模型以**自己生成的有噪退化历史**为条件,而非只用纯净 ground-truth 历史。这消除了自回归模型常见的 train-test gap。
|
||||
|
||||
## 动机:Train-Test Gap
|
||||
|
||||
自回归扩散模型的标准训练使用**干净的历史**作为条件:
|
||||
```
|
||||
训练: p(x_t | clean x_<t)
|
||||
推理: p(x_t | self-generated x_<t) ← 条件分布不同!
|
||||
```
|
||||
推理时模型遇到的是自己不完美的生成历史,这种分布偏移导致误差累积和内容漂移。
|
||||
|
||||
## 机制
|
||||
|
||||
MaineCoon 在每次训练迭代中同时准备两种历史:
|
||||
1. **Clean history**(标准):使用 ground-truth chunk
|
||||
2. **Degraded history**(通过 self-resampling 生成):
|
||||
- 模型以 clean history 为条件,**无梯度**前向生成一个候选 chunk
|
||||
- 对该候选添加高斯噪声 → 得到 `degraded_chunk`
|
||||
- 用它替换对应的 clean chunk,构造 degraded history
|
||||
|
||||
训练目标在两种历史上均计算 loss:
|
||||
```
|
||||
L = L_clean + L_degraded
|
||||
```
|
||||
|
||||
## 关键设计
|
||||
|
||||
- **Stop-gradient**:self-resampling rollout 通过 `stop_grad` 执行,梯度仅通过 main forward pass 回传
|
||||
- **概率混合**:以概率 `p` 使用 degraded history,以 `1-p` 使用 clean history
|
||||
- **无需 Teacher**:不依赖非因果教师模型的 forcing 蒸馏
|
||||
|
||||
## 效果
|
||||
|
||||
- 直接以推理时的 chunk-by-chunk causal regime 训练
|
||||
- 模型对长时部署中的退化上下文天然鲁棒
|
||||
- 与 [[reinforced-online-policy-distillation|ROPD]] 等后训练兼容
|
||||
|
||||
## 参考
|
||||
- [[maineCoon|MaineCoon 论文]] Section 3.1
|
||||
- [[native-streaming-ar-training|原生流式 AR 训练]]
|
||||
- Resampling Forcing (Cui et al.)
|
||||
Reference in New Issue
Block a user