20260514:增加新内容

2026-05-14 13:54:52 +08:00
parent 56c4d3ef7c
commit b116710e4c
294 changed files with 10682 additions and 255 deletions
--- a/concepts/generation-verification-asymmetry.md
+++ b/concepts/generation-verification-asymmetry.md
@@ -0,0 +1,42 @@
+---
+title: 生成-验证不对称性 (Generation-Verification Asymmetry)
+created: 2025-04-15
+updated: 2026-05-01
+type: concept
+tags: []
+sources: []
+---
+
+# 生成-验证不对称性 (Generation-Verification Asymmetry)
+
+**生成任务困难但验证结果容易的计算不对称性**，是 [[self-verification-rewards|自我验证奖励]] 和可扩展 URLVR 的理论基础。
+
+## 核心洞见
+
+许多结构化任务中存在天然的不对称性：
+
+| 任务 | 生成难度 | 验证难度 |
+|------|---------|---------|
+| 数学推理 | 需要多步推导 | 计算最终表达式即可 |
+| 代码生成 | 需要逻辑设计 | 运行测试用例即可 |
+| 约束满足 | 需要回溯搜索 | 检查约束即可 |
+
+## 对 URLVR 的关键意义
+
+这一不对称性在标准 RLVR 中已是关键（代码执行验证、数学答案比对），但在 URLVR 中更加重要：
+- **内在奖励**: 模型从自身推导信号 → 受模型先验限制
+- **外部奖励 + GVA**: 模型生成 + 模型验证，但验证步骤利用的是"计算"而非"置信度" → 可能突破天花板
+
+## He et al. 的实验洞见
+
+在 Countdown 任务中，self-verification 利用 GVA 展示了持续改进而无崩溃的证据。生成-验证之间的差距越大，外部奖励信号的可靠性越高。
+
+## 推广
+
+GVA 不限于数学/代码 —— 任何"生成成本高于验证成本"的领域（逻辑推理、规划、排序）都可能利用这一不对称性构建可扩展的无监督奖励。
+
+## 相关概念
+
+- [[self-verification-rewards]] — 利用 GVA 的具体方法
+- [[unsupervised-rlvr]] — URLVR 全景
+- [[he-urlvr-sharpening-2026]] — 综述参考