20260625:很多新内容

2026-06-25 14:08:47 +08:00
parent 91fac5b6fc
commit 6021dea160
375 changed files with 19263 additions and 251 deletions
--- a/concepts/self-resampling.md
+++ b/concepts/self-resampling.md
@@ -0,0 +1,52 @@
+---
+title: "Self-Resampling"
+created: 2026-06-20
+updated: 2026-06-20
+type: concept
+tags: ["training", "autoregressive", "streaming", "diffusion", "self-play"]
+sources: ["https://arxiv.org/abs/2606.17800"]
+---
+
+# Self-Resampling (自重采样)
+
+**Self-Resampling** 是 [[maineCoon|MaineCoon]] 提出的流式自回归训练技术：在训练时让模型以**自己生成的有噪退化历史**为条件，而非只用纯净 ground-truth 历史。这消除了自回归模型常见的 train-test gap。
+
+## 动机：Train-Test Gap
+
+自回归扩散模型的标准训练使用**干净的历史**作为条件：
+```
+训练: p(x_t | clean x_<t)
+推理: p(x_t | self-generated x_<t)  ← 条件分布不同！
+```
+推理时模型遇到的是自己不完美的生成历史，这种分布偏移导致误差累积和内容漂移。
+
+## 机制
+
+MaineCoon 在每次训练迭代中同时准备两种历史：
+1. **Clean history**（标准）：使用 ground-truth chunk
+2. **Degraded history**（通过 self-resampling 生成）：
+   - 模型以 clean history 为条件，**无梯度**前向生成一个候选 chunk
+   - 对该候选添加高斯噪声 → 得到 `degraded_chunk`
+   - 用它替换对应的 clean chunk，构造 degraded history
+
+训练目标在两种历史上均计算 loss：
+```
+L = L_clean + L_degraded
+```
+
+## 关键设计
+
+- **Stop-gradient**：self-resampling rollout 通过 `stop_grad` 执行，梯度仅通过 main forward pass 回传
+- **概率混合**：以概率 `p` 使用 degraded history，以 `1-p` 使用 clean history
+- **无需 Teacher**：不依赖非因果教师模型的 forcing 蒸馏
+
+## 效果
+
+- 直接以推理时的 chunk-by-chunk causal regime 训练
+- 模型对长时部署中的退化上下文天然鲁棒
+- 与 [[reinforced-online-policy-distillation|ROPD]] 等后训练兼容
+
+## 参考
+- [[maineCoon|MaineCoon 论文]] Section 3.1
+- [[native-streaming-ar-training|原生流式 AR 训练]]
+- Resampling Forcing (Cui et al.)