20260601

2026-06-01 10:46:01 +08:00
parent 2faf4bb002
commit e96b955fda
221 changed files with 10219 additions and 332 deletions
--- a/concepts/distribution-shift.md
+++ b/concepts/distribution-shift.md
@@ -0,0 +1,36 @@
+---
+title: "Distribution Shift（分布偏移）"
+created: 2026-05-18
+type: concept
+tags: ["pre-training", "LLM", "domain-adaptation"]
+sources: ["https://arxiv.org/abs/2604.14142"]
+---
+
+# Distribution Shift（分布偏移）
+
+## 在 PreRL 语境中的定义
+
+传统预训练使用的**静态语料**（web text, Wikipedia）与下游推理任务的**任务分布**之间存在显著的分布偏移。这种偏移导致：
+- 预训练知识无法有针对性地增强推理能力
+- 直接 SFT 微调受限于预训练分布
+- RLVR 可部分弥补，但被基座模型的上限所约束
+
+## PreRL 的解决方案
+
+PreRL 通过**在线、奖励驱动**的更新直接在 P(y) 上操作，消除了"语料→任务"的分布桥接需求：
+- 不使用静态语料，而是从任务中采样 self-rollout
+- 使用可验证奖励而非 NTP loss
+- 只更新 response 部分，保持任务对齐
+
+## 对比
+
+| 方法 | 数据源 | 学习信号 | 分布偏移 |
+|------|--------|---------|---------|
+| Pre-training | 静态语料 | NTP | 高 |
+| Continual Pre-training | 任务相关语料 | NTP | 中 |
+| PreRL | Online rollout | Verifiable reward | **低** |
+
+## 相关概念
+
+- [[pre-train-space-reinforcement-learning|PreRL]]
+- [[dual-space-rl|DSRL]]