SidneyZhang/myWiki

Files

Sidney Zhang e96b955fda

20260601

2026-06-01 10:46:01 +08:00

1.5 KiB

Raw Blame History

title, created, updated, type, tags, sources

title

created

updated

type

tags

sources

Thompson Sampling Code Search

2026-05-29

2026-05-29

concept

search

code-synthesis

thompson-sampling

optimization

https://arxiv.org/abs/2603.03329

Thompson Sampling Code Search

Thompson Sampling Code Search 是 autoharness 中用于探索代码 harness 空间的搜索算法：维护多个代码假设的树结构，用 Thompson sampling (Tang et al., 2024) 选择下一个精炼节点。

算法

树结构：每个节点是一个代码假设（harness 的某个版本）
Heuristic value：每个节点的平均合法动作率
选择：Thompson sampling 在探索（尝试不同的代码逻辑）和利用（精炼已有进展的 harness）之间平衡
精炼：被选中的节点由 Refiner（LLM）基于环境 Critic 的 feedback 生成改进版本

为什么是 Thompson Sampling？

在线学习：每一步都需要决策下一个尝试方向
不确定性量化：Thompson sampling 自然处理节点价值估计的不确定性
平衡探索-利用：通过后验采样自动平衡——高不确定性的节点以正概率被选中

在 Harness 合成中的作用

Thompson sampling 决定了在每次迭代中"精炼哪个代码版本"——这直接影响了搜索效率。平均只需 14.5 次迭代即可达到 100% 合法率。

相关

autoharness — 使用此搜索的方法
iterative-code-refinement — 每步的具体操作
lou-autoharness-2026 — 原始论文