20260601

2026-06-01 10:46:01 +08:00
parent 2faf4bb002
commit e96b955fda
221 changed files with 10219 additions and 332 deletions
--- a/concepts/harness-as-action-verifier.md
+++ b/concepts/harness-as-action-verifier.md
@@ -0,0 +1,50 @@
+---
+title: "Harness-as-Action-Verifier"
+created: 2026-05-29
+updated: 2026-05-29
+type: concept
+tags: ["agent", "verification", "code-synthesis", "LLM"]
+sources: ["https://arxiv.org/abs/2603.03329"]
+---
+
+# Harness-as-Action-Verifier
+
+**Harness-as-Action-Verifier** 是 [[autoharness|AutoHarness]] 的核心 harness 模式：LLM 负责提出动作，代码 harness 负责验证其合法性——非法则让 LLM 重新提议。
+
+## 工作流程
+
+```
+1. LLM 观察环境状态 → 提议动作
+2. is_legal_action(obs, action) → 验证合法性
+3. 合法 → 执行动作
+   非法 → 将 "illegal action" 警告注入 LLM prompt → 回到步骤 1
+```
+
+本质上是一个 **rejection sampler**，其中 acceptance condition (`is_legal_action()`) 是从环境 feedback 中学习的。
+
+## 训练
+
+- 10 个并行环境，每个 rollout 最多 1000 步
+- 遇到非法动作即终止 rollout
+- 最多采样 5 个失败步 → Critic 分析 → Refiner 生成改进代码
+- Thompson sampling 引导搜索方向
+- 平均 14.5 次迭代完成训练
+
+## 成果
+
+- 145 个 TextArena 游戏上 **100% 合法动作率**
+- Gemini-2.5-Flash + Verifier 胜 Gemini-2.5-Pro（裸奔）
+
+## 与 Action Filter 的区别
+
+| 特性 | Verifier | Action Filter |
+|------|----------|---------------|
+| LLM 角色 | 策略制定者 | 排序者 |
+| 动作生成 | LLM 自由提议 | 代码枚举合法动作 |
+| 适用场景 | 动作空间大 | 动作空间可控 |
+
+## 相关
+
+- [[autoharness]] — 完整方法
+- [[harness-as-policy]] — 更激进的替代方案
+- [[action-applicability]] — 核心问题