Files
myWiki/concepts/self-resampling.md

53 lines
1.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Self-Resampling"
created: 2026-06-20
updated: 2026-06-20
type: concept
tags: ["training", "autoregressive", "streaming", "diffusion", "self-play"]
sources: ["https://arxiv.org/abs/2606.17800"]
---
# Self-Resampling (自重采样)
**Self-Resampling** 是 [[maineCoon|MaineCoon]] 提出的流式自回归训练技术:在训练时让模型以**自己生成的有噪退化历史**为条件,而非只用纯净 ground-truth 历史。这消除了自回归模型常见的 train-test gap。
## 动机Train-Test Gap
自回归扩散模型的标准训练使用**干净的历史**作为条件:
```
训练: p(x_t | clean x_<t)
推理: p(x_t | self-generated x_<t) ← 条件分布不同!
```
推理时模型遇到的是自己不完美的生成历史,这种分布偏移导致误差累积和内容漂移。
## 机制
MaineCoon 在每次训练迭代中同时准备两种历史:
1. **Clean history**(标准):使用 ground-truth chunk
2. **Degraded history**(通过 self-resampling 生成):
- 模型以 clean history 为条件,**无梯度**前向生成一个候选 chunk
- 对该候选添加高斯噪声 → 得到 `degraded_chunk`
- 用它替换对应的 clean chunk构造 degraded history
训练目标在两种历史上均计算 loss
```
L = L_clean + L_degraded
```
## 关键设计
- **Stop-gradient**self-resampling rollout 通过 `stop_grad` 执行,梯度仅通过 main forward pass 回传
- **概率混合**:以概率 `p` 使用 degraded history`1-p` 使用 clean history
- **无需 Teacher**:不依赖非因果教师模型的 forcing 蒸馏
## 效果
- 直接以推理时的 chunk-by-chunk causal regime 训练
- 模型对长时部署中的退化上下文天然鲁棒
- 与 [[reinforced-online-policy-distillation|ROPD]] 等后训练兼容
## 参考
- [[maineCoon|MaineCoon 论文]] Section 3.1
- [[native-streaming-ar-training|原生流式 AR 训练]]
- Resampling Forcing (Cui et al.)