Files
myWiki/concepts/rejected-edit-buffer.md
2026-06-01 10:46:01 +08:00

45 lines
1.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Rejected-Edit Buffer (拒绝编辑缓冲)"
created: 2026-05-29
updated: 2026-05-29
type: concept
tags: ["optimization", "negative-feedback", "skill", "buffer"]
sources: ["https://arxiv.org/abs/2605.23904"]
---
# Rejected-Edit Buffer (拒绝编辑缓冲)
**Rejected-Edit Buffer** 是 [[skillopt|SkillOpt]] 中的负反馈机制:被 [[held-out-validation-gate|Validation Gate]] 拒绝的编辑被记录为 epoch-local buffer作为后续优化步骤的**负反馈信号**。它是深度学习中负梯度在文本空间的对应。
## 记录内容
Buffer 包含:
- 观察到的失败模式
- 被尝试但被拒绝的编辑
- 编辑造成的 score drop
## 如何使用
后续 reflection 调用在同一 epoch 内接收此 buffer使 optimizer 能够:
- **避免重复失败的编辑**
- **聚焦于尚未解决的失败**
- 从"什么不行"中学习
## 与正反馈的配合
| 信号类型 | 来源 | 作用 |
|----------|------|------|
| 正反馈 | Accepted edits | 保留、强化 |
| 负反馈 | Rejected edits (buffer) | 避免重复、引导新方向 |
## 关键优势
- **训练时使用,推理时零成本**Buffer 只在优化阶段存在,不增加部署开销
- **epoch-local**:每个 epoch 独立 buffer避免跨 epoch 的过时信息污染
## 相关
- [[held-out-validation-gate]] — 产生拒绝的 gate
- [[skillopt]] — 使用 buffer 的方法
- [[text-space-optimizer]] — 文本空间优化范式