20260601
This commit is contained in:
42
concepts/held-out-validation-gate.md
Normal file
42
concepts/held-out-validation-gate.md
Normal file
@@ -0,0 +1,42 @@
|
||||
---
|
||||
title: "Held-Out Validation Gate (留出验证门)"
|
||||
created: 2026-05-29
|
||||
updated: 2026-05-29
|
||||
type: concept
|
||||
tags: ["optimization", "validation", "skill", "gate"]
|
||||
sources: ["https://arxiv.org/abs/2605.23904"]
|
||||
---
|
||||
|
||||
# Held-Out Validation Gate (留出验证门)
|
||||
|
||||
**Held-Out Validation Gate** 是 [[skillopt|SkillOpt]] 中的关键安全机制:每个候选 skill 编辑必须在留出的验证集上通过评估,只有在严格改善时才被接受。它是深度学习中 validation-based model selection 在文本空间的对应。
|
||||
|
||||
## 工作流程
|
||||
|
||||
```
|
||||
Candidate Skill → 在 D_sel 上评估 →
|
||||
改善?→ Accept → 可能成为 best_skill.md
|
||||
未改善?→ [[rejected-edit-buffer|Reject → buffer]]
|
||||
```
|
||||
|
||||
## 为什么至关重要
|
||||
|
||||
LLM 可以生成"看起来合理"的编辑,但实际上会降低目标模型的表现。Validation gate 将**反思**(reflection)转变为**提出-验证型优化**(propose-and-test),而非无条件地自编辑。
|
||||
|
||||
## 双重判断
|
||||
|
||||
- **Improvement over current**: 候选 skill 是否比当前 skill 更好?
|
||||
- **All-time best**: 是否超过历史最佳?→ 更新 `best_skill.md`
|
||||
|
||||
## 与深度学习的类比
|
||||
|
||||
```
|
||||
深度学习: 在 val set 上选最佳 checkpoint
|
||||
SkillOpt: 在 D_sel 上 gate 每个 skill edit
|
||||
```
|
||||
|
||||
## 相关
|
||||
|
||||
- [[text-space-optimizer]] — 文本空间优化范式
|
||||
- [[skillopt]] — 使用 validation gate 的方法
|
||||
- [[rejected-edit-buffer]] — 被 gate 拒绝的编辑的去向
|
||||
Reference in New Issue
Block a user