62 lines
2.7 KiB
Markdown
62 lines
2.7 KiB
Markdown
---
|
||
title: "SkillOpt: Agent Skill 的文本空间优化器"
|
||
created: 2026-05-29
|
||
updated: 2026-05-29
|
||
type: paper
|
||
arxiv: "2605.23904"
|
||
authors: ["Yifan Yang", "Ziyang Gong", "Weiquan Huang", "Qihao Yang", "Ziwei Zhou", "Zisu Huang", "Yan Li", "Xuemei Gao", "Qi Dai", "Bei Liu", "Kai Qiu", "Yuqing Yang", "Dongdong Chen", "Xue Yang", "Chong Luo"]
|
||
venue: "arXiv cs.AI, May 2026"
|
||
tags: ["agent", "skill", "optimization", "text-space", "self-evolving"]
|
||
sources: ["https://arxiv.org/abs/2605.23904"]
|
||
---
|
||
|
||
# SkillOpt: Agent Skill 的文本空间优化器
|
||
|
||
> **论文**: Yang et al. (Microsoft, SJTU, Tongji, Fudan, 2026) — arXiv:2605.23904
|
||
|
||
## 核心问题
|
||
|
||
Agent skills 今天是被手写、一次性生成或松散自修正的——**没有一个像深度学习的 optimizer 那样可靠地优化 skill**。如果 skill 是 Agent 的适配层,它应该像模型参数一样被**系统地训练**。
|
||
|
||
## 方法:SkillOpt as Text-Space Optimizer
|
||
|
||
SkillOpt 将 skill 优化建模为 [[text-space-optimizer|文本空间中的优化问题]],与权重空间的深度学习优化形成精确类比:
|
||
|
||
| 深度学习 | SkillOpt |
|
||
|----------|----------|
|
||
| 参数 θ | Skill 文档 |
|
||
| 梯度方向 | 轨迹反馈衍生的编辑方向 |
|
||
| 学习率 | 文本编辑预算(bounded edits) |
|
||
| Validation | [[held-out-validation-gate\|留出验证门]] |
|
||
| Momentum | [[slow-meta-update\|epoch-wise slow/meta update]] |
|
||
|
||
### 核心循环
|
||
|
||
```
|
||
Frozen Agent + Skill → 采样 rollout batch →
|
||
Optimizer 分析成败 → 提出 add/delete/replace 编辑 →
|
||
聚合排名 → bounded update → Validation Gate →
|
||
Accept (best_skill.md) / Reject → [[rejected-edit-buffer\|buffer 记录失败模式]]
|
||
```
|
||
|
||
## 关键结果
|
||
|
||
- **52/52 best or tied**:跨 6 benchmarks × 7 models × 3 harnesses(direct chat, Codex, Claude Code)
|
||
- GPT-5.5 + SkillOpt 平均提升:**+23.5 pts** (direct), **+24.8** (Codex), **+19.1** (Claude Code)
|
||
- **跨模型/跨 harness/跨 benchmark 迁移**:一次训练,多处复用
|
||
- Skill 极度紧凑:**300–2,000 tokens**,仅需 1–4 次 accepted edits
|
||
|
||
## 核心洞察
|
||
|
||
SkillOpt 的深层哲学:**Agent 的适应不一定要改模型权重——skill 文档就是一个可训练的"外部状态"**。通过引入 deep learning optimizer 的控制纪律(learning rate、validation gate、momentum),skill optimization 从"随便改改"变成了可复现的训练过程。
|
||
|
||
## 概念网络
|
||
|
||
- [[skillopt|SkillOpt]] — 方法总览
|
||
- [[text-space-optimizer]] — 文本空间优化的范式类比
|
||
- [[textual-learning-rate]] — 编辑预算控制
|
||
- [[held-out-validation-gate]] — 留出验证门
|
||
- [[rejected-edit-buffer]] — 失败编辑负反馈
|
||
- [[slow-meta-update]] — epoch-wise 动量
|
||
- [[skill-as-external-state]] — Skill 作为可训练外部状态的哲学
|