20260601
This commit is contained in:
52
concepts/skillopt.md
Normal file
52
concepts/skillopt.md
Normal file
@@ -0,0 +1,52 @@
|
||||
---
|
||||
title: "SkillOpt"
|
||||
created: 2026-05-29
|
||||
updated: 2026-05-29
|
||||
type: concept
|
||||
tags: ["agent", "skill", "optimization", "text-space"]
|
||||
sources: ["https://arxiv.org/abs/2605.23904"]
|
||||
---
|
||||
|
||||
# SkillOpt
|
||||
|
||||
**SkillOpt** 是 Yang et al. (Microsoft, 2026) 提出的第一个系统性的 [[text-space-optimizer|文本空间优化器]],用于训练 Agent 的 skill 文档。它将 skill 视为 frozen agent 的**可训练外部状态**,用 deep learning optimizer 的控制纪律来优化自然语言 artifact。
|
||||
|
||||
## 核心类比
|
||||
|
||||
| 深度学习 | SkillOpt |
|
||||
|----------|----------|
|
||||
| 参数 θ | Skill 文档(300–2,000 tokens) |
|
||||
| 梯度方向 | Rollout 轨迹 → 编辑方向 |
|
||||
| 学习率 | [[textual-learning-rate|编辑预算 L_t]] |
|
||||
| Validation | [[held-out-validation-gate|留出门]] |
|
||||
| Momentum | [[slow-meta-update|慢更新]] |
|
||||
| 负梯度 | [[rejected-edit-buffer|拒绝缓冲]] |
|
||||
|
||||
## 训练循环
|
||||
|
||||
1. **Rollout Batch**: Frozen Agent 用当前 skill 在训练数据上执行
|
||||
2. **Reflection Minibatches**: Optimizer 分析成功/失败轨迹
|
||||
3. **Edit Proposal**: Optimizer 提出 add/delete/replace 编辑
|
||||
4. **Aggregation & Ranking**: 合并所有 minibatch 的编辑,按预期效用排名
|
||||
5. **Bounded Update**: 在 [[textual-learning-rate|编辑预算]] 内应用 top 编辑
|
||||
6. **Validation Gate**: 候选 skill 在 held-out 数据上验证,改善才接受
|
||||
7. **Rejected Buffer**: 拒绝的编辑记录为负反馈
|
||||
|
||||
## 覆盖范围
|
||||
|
||||
- **6 benchmarks**: SearchQA, SpreadsheetBench, OfficeQA, DocVQA, LiveMathematicianBench, ALFWorld
|
||||
- **7 models**: GPT-5.5 down to Qwen
|
||||
- **3 harnesses**: Direct chat, Codex, Claude Code
|
||||
- **52/52 best or tied**
|
||||
|
||||
## 迁移能力
|
||||
|
||||
Skill 一次训练后可跨模型、跨 harness、跨 benchmark 复用:
|
||||
- SpreadsheetBench skill (GPT-5.4) → 改善所有更小的 GPT 变体
|
||||
- Codex-trained skill → Claude Code: +59.7 pts
|
||||
|
||||
## 相关
|
||||
|
||||
- [[yang-skillopt-2026]] — 原始论文
|
||||
- [[text-space-optimizer]] — 核心范式
|
||||
- [[skill-as-external-state]] — 哲学基础
|
||||
Reference in New Issue
Block a user