20260601

2026-06-01 10:46:01 +08:00
parent 2faf4bb002
commit e96b955fda
221 changed files with 10219 additions and 332 deletions
--- a/concepts/iterative-code-refinement.md
+++ b/concepts/iterative-code-refinement.md
@@ -0,0 +1,48 @@
+---
+title: "Iterative Code Refinement (迭代代码精炼)"
+created: 2026-05-29
+updated: 2026-05-29
+type: concept
+tags: ["code-synthesis", "optimization", "LLM", "feedback-loop"]
+sources: ["https://arxiv.org/abs/2603.03329"]
+---
+
+# Iterative Code Refinement (迭代代码精炼)
+
+**Iterative Code Refinement** 是 [[autoharness|AutoHarness]] 的核心优化循环：LLM 作为 gradient-free code optimizer，基于环境反馈反复改进代码 harness。
+
+## 循环结构
+
+```
+Old Code → Refiner (LLM) → New Code → Evaluator (Environment) → Critic → Refiner → ...
+```
+
+### Refiner
+- 接收：当前代码 + 失败案例 + 错误信息
+- 输出：改进后的代码
+- 角色：**gradient-free optimizer**——不需要梯度，通过语言理解来"调试"代码
+
+### Critic
+- 接收：环境 rollout 结果
+- 输出：结构化反馈（哪些动作非法、为什么、环境 reward）
+- 角色：给 Refiner 提供"优化方向"
+
+## 精炼策略
+
+- `is_legal_action()` 返回 True 但动作无效 → 同时精炼 `propose_action()` 和 `is_legal_action()`
+- `is_legal_action()` 返回 False 且动作无效 → 仅精炼 `propose_action()`
+
+## 与标准 LLM 代码生成的对比
+
+| 特性 | 标准生成 | Iterative Refinement |
+|------|----------|---------------------|
+| 尝试次数 | 1 次 | 多轮 |
+| 反馈 | 无 | 环境反馈 |
+| 搜索 | 无 | Thompson sampling 引导 |
+| 成功率 | 依赖 prompt | 可达 100% |
+
+## 相关
+
+- [[autoharness]] — 使用此精炼的方法
+- [[thompson-sampling-code-search]] — 选择精炼目标的搜索
+- [[lou-autoharness-2026]] — 原始论文