SidneyZhang/myWiki

Files

Sidney Zhang 6021dea160

20260625:很多新内容

2026-06-25 14:08:47 +08:00

1.9 KiB

Raw Blame History

title, created, updated, type, tags, sources

title

created

updated

type

tags

sources

Self-Resampling

2026-06-20

2026-06-20

concept

training

autoregressive

streaming

diffusion

self-play

https://arxiv.org/abs/2606.17800

Self-Resampling (自重采样)

Self-Resampling 是 maineCoon 提出的流式自回归训练技术：在训练时让模型以自己生成的有噪退化历史为条件，而非只用纯净 ground-truth 历史。这消除了自回归模型常见的 train-test gap。

动机：Train-Test Gap

自回归扩散模型的标准训练使用干净的历史作为条件：

训练: p(x_t | clean x_<t)
推理: p(x_t | self-generated x_<t)  ← 条件分布不同！

推理时模型遇到的是自己不完美的生成历史，这种分布偏移导致误差累积和内容漂移。

机制

MaineCoon 在每次训练迭代中同时准备两种历史：

Clean history（标准）：使用 ground-truth chunk
Degraded history（通过 self-resampling 生成）：
- 模型以 clean history 为条件，无梯度前向生成一个候选 chunk
- 对该候选添加高斯噪声 → 得到 degraded_chunk
- 用它替换对应的 clean chunk，构造 degraded history

训练目标在两种历史上均计算 loss：

L = L_clean + L_degraded

关键设计

Stop-gradient：self-resampling rollout 通过 stop_grad 执行，梯度仅通过 main forward pass 回传
概率混合：以概率 p 使用 degraded history，以 1-p 使用 clean history
无需 Teacher：不依赖非因果教师模型的 forcing 蒸馏

效果

直接以推理时的 chunk-by-chunk causal regime 训练
模型对长时部署中的退化上下文天然鲁棒
与 reinforced-online-policy-distillation 等后训练兼容

参考

maineCoon Section 3.1
native-streaming-ar-training
Resampling Forcing (Cui et al.)