Files
myWiki/papers/yang-skillopt-2026.md

64 lines
2.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "SkillOpt: Agent Skill 的文本空间优化器"
created: 2026-05-29
updated: 2026-05-29
type: paper
arxiv: "2605.23904"
authors: ["Yifan Yang", "Ziyang Gong", "Weiquan Huang", "Qihao Yang", "Ziwei Zhou", "Zisu Huang", "Yan Li", "Xuemei Gao", "Qi Dai", "Bei Liu", "Kai Qiu", "Yuqing Yang", "Dongdong Chen", "Xue Yang", "Chong Luo"]
venue: "arXiv cs.AI, May 2026"
tags: ["agent", "skill", "optimization", "text-space", "self-evolving"]
sources: ["https://arxiv.org/abs/2605.23904"]
code: "https://github.com/microsoft/SkillOpt"
---
# SkillOpt: Agent Skill 的文本空间优化器
> **论文**: Yang et al. (Microsoft, SJTU, Tongji, Fudan, 2026) — arXiv:2605.23904
> **代码**: https://github.com/microsoft/SkillOpt (MIT, 3.7k stars)
## 核心问题
Agent skills 今天是被手写、一次性生成或松散自修正的——**没有一个像深度学习的 optimizer 那样可靠地优化 skill**。如果 skill 是 Agent 的适配层,它应该像模型参数一样被**系统地训练**。
## 方法SkillOpt as Text-Space Optimizer
SkillOpt 将 skill 优化建模为 [[text-space-optimizer|文本空间中的优化问题]],与权重空间的深度学习优化形成精确类比:
| 深度学习 | SkillOpt |
|----------|----------|
| 参数 θ | Skill 文档 |
| 梯度方向 | 轨迹反馈衍生的编辑方向 |
| 学习率 | 文本编辑预算bounded edits |
| Validation | [[held-out-validation-gate\|留出验证门]] |
| Momentum | [[slow-meta-update\|epoch-wise slow/meta update]] |
### 核心循环
```
Frozen Agent + Skill → 采样 rollout batch →
Optimizer 分析成败 → 提出 add/delete/replace 编辑 →
聚合排名 → bounded update → Validation Gate →
Accept (best_skill.md) / Reject → [[rejected-edit-buffer\|buffer 记录失败模式]]
```
## 关键结果
- **52/52 best or tied**:跨 6 benchmarks × 7 models × 3 harnessesdirect chat, Codex, Claude Code
- GPT-5.5 + SkillOpt 平均提升:**+23.5 pts** (direct), **+24.8** (Codex), **+19.1** (Claude Code)
- **跨模型/跨 harness/跨 benchmark 迁移**:一次训练,多处复用
- Skill 极度紧凑:**3002,000 tokens**,仅需 14 次 accepted edits
## 核心洞察
SkillOpt 的深层哲学:**Agent 的适应不一定要改模型权重——skill 文档就是一个可训练的"外部状态"**。通过引入 deep learning optimizer 的控制纪律learning rate、validation gate、momentumskill optimization 从"随便改改"变成了可复现的训练过程。
## 概念网络
- [[skillopt|SkillOpt]] — 方法总览
- [[text-space-optimizer]] — 文本空间优化的范式类比
- [[textual-learning-rate]] — 编辑预算控制
- [[held-out-validation-gate]] — 留出验证门
- [[rejected-edit-buffer]] — 失败编辑负反馈
- [[slow-meta-update]] — epoch-wise 动量
- [[skill-as-external-state]] — Skill 作为可训练外部状态的哲学