20260601
This commit is contained in:
48
concepts/iterative-code-refinement.md
Normal file
48
concepts/iterative-code-refinement.md
Normal file
@@ -0,0 +1,48 @@
|
||||
---
|
||||
title: "Iterative Code Refinement (迭代代码精炼)"
|
||||
created: 2026-05-29
|
||||
updated: 2026-05-29
|
||||
type: concept
|
||||
tags: ["code-synthesis", "optimization", "LLM", "feedback-loop"]
|
||||
sources: ["https://arxiv.org/abs/2603.03329"]
|
||||
---
|
||||
|
||||
# Iterative Code Refinement (迭代代码精炼)
|
||||
|
||||
**Iterative Code Refinement** 是 [[autoharness|AutoHarness]] 的核心优化循环:LLM 作为 gradient-free code optimizer,基于环境反馈反复改进代码 harness。
|
||||
|
||||
## 循环结构
|
||||
|
||||
```
|
||||
Old Code → Refiner (LLM) → New Code → Evaluator (Environment) → Critic → Refiner → ...
|
||||
```
|
||||
|
||||
### Refiner
|
||||
- 接收:当前代码 + 失败案例 + 错误信息
|
||||
- 输出:改进后的代码
|
||||
- 角色:**gradient-free optimizer**——不需要梯度,通过语言理解来"调试"代码
|
||||
|
||||
### Critic
|
||||
- 接收:环境 rollout 结果
|
||||
- 输出:结构化反馈(哪些动作非法、为什么、环境 reward)
|
||||
- 角色:给 Refiner 提供"优化方向"
|
||||
|
||||
## 精炼策略
|
||||
|
||||
- `is_legal_action()` 返回 True 但动作无效 → 同时精炼 `propose_action()` 和 `is_legal_action()`
|
||||
- `is_legal_action()` 返回 False 且动作无效 → 仅精炼 `propose_action()`
|
||||
|
||||
## 与标准 LLM 代码生成的对比
|
||||
|
||||
| 特性 | 标准生成 | Iterative Refinement |
|
||||
|------|----------|---------------------|
|
||||
| 尝试次数 | 1 次 | 多轮 |
|
||||
| 反馈 | 无 | 环境反馈 |
|
||||
| 搜索 | 无 | Thompson sampling 引导 |
|
||||
| 成功率 | 依赖 prompt | 可达 100% |
|
||||
|
||||
## 相关
|
||||
|
||||
- [[autoharness]] — 使用此精炼的方法
|
||||
- [[thompson-sampling-code-search]] — 选择精炼目标的搜索
|
||||
- [[lou-autoharness-2026]] — 原始论文
|
||||
Reference in New Issue
Block a user