SidneyZhang/myWiki

Files

Sidney Zhang e96b955fda

20260601

2026-06-01 10:46:01 +08:00

1.6 KiB

Raw Blame History

title, created, updated, type, tags, sources

title

created

updated

type

tags

sources

Iterative Code Refinement (迭代代码精炼)

2026-05-29

2026-05-29

concept

code-synthesis

optimization

LLM

feedback-loop

https://arxiv.org/abs/2603.03329

Iterative Code Refinement (迭代代码精炼)

Iterative Code Refinement 是 autoharness 的核心优化循环：LLM 作为 gradient-free code optimizer，基于环境反馈反复改进代码 harness。

循环结构

Old Code → Refiner (LLM) → New Code → Evaluator (Environment) → Critic → Refiner → ...

Refiner

接收：当前代码 + 失败案例 + 错误信息
输出：改进后的代码
角色：gradient-free optimizer——不需要梯度，通过语言理解来"调试"代码

Critic

接收：环境 rollout 结果
输出：结构化反馈（哪些动作非法、为什么、环境 reward）
角色：给 Refiner 提供"优化方向"

精炼策略

is_legal_action() 返回 True 但动作无效 → 同时精炼 propose_action() 和 is_legal_action()
is_legal_action() 返回 False 且动作无效 → 仅精炼 propose_action()

与标准 LLM 代码生成的对比

特性	标准生成	Iterative Refinement
尝试次数	1 次	多轮
反馈	无	环境反馈
搜索	无	Thompson sampling 引导
成功率	依赖 prompt	可达 100%

相关

autoharness — 使用此精炼的方法
thompson-sampling-code-search — 选择精炼目标的搜索
lou-autoharness-2026 — 原始论文