29 lines
2.5 KiB
Markdown
29 lines
2.5 KiB
Markdown
---
|
||
title: "SkillOpt: Executive Strategy for Self-Evolving Agent Skills"
|
||
created: 2026-05-29
|
||
type: paper-raw
|
||
arxiv: "2605.23904"
|
||
authors: ["Yifan Yang", "Ziyang Gong", "Weiquan Huang", "Qihao Yang", "Ziwei Zhou", "Zisu Huang", "Yan Li", "Xuemei Gao", "Qi Dai", "Bei Liu", "Kai Qiu", "Yuqing Yang", "Dongdong Chen", "Xue Yang", "Chong Luo"]
|
||
venue: "arXiv preprint (cs.AI), v2, May 2026"
|
||
affiliation: "Microsoft, Shanghai Jiao Tong University, Tongji University, Fudan University"
|
||
tags: ["agent", "skill", "optimization", "text-space", "self-evolving"]
|
||
---
|
||
|
||
# SkillOpt: Executive Strategy for Self-Evolving Agent Skills
|
||
|
||
**Authors:** Yifan Yang*, Ziyang Gong*, Weiquan Huang*, Qihao Yang*, Ziwei Zhou*, Zisu Huang*, Yan Li, Xuemei Gao, Qi Dai, Bei Liu, Kai Qiu, Yuqing Yang, Dongdong Chen, Xue Yang, Chong Luo (* equal contribution)
|
||
**Affiliation:** Microsoft, SJTU, Tongji, Fudan
|
||
**arXiv:** [2605.23904](https://arxiv.org/abs/2605.23904) (v2, 25 May 2026)
|
||
**Code:** https://github.com/microsoft/SkillOpt (MIT License, 3.7k stars)
|
||
|
||
## Abstract
|
||
|
||
Agent skills today are hand-crafted, generated one-shot, or evolved through loosely controlled self-revision—none of which behaves like a deep-learning optimizer for the skill, and none of which reliably improves over its starting point under feedback. We argue the skill should instead be trained as the external state of a frozen agent. SkillOpt is the first systematic controllable text-space optimizer for agent skills: a separate optimizer model turns scored rollouts into bounded add/delete/replace edits on a single skill document, and an edit is accepted only when it strictly improves a held-out validation score. A textual learning-rate budget, rejected-edit buffer, and epoch-wise slow/meta update make skill training stable while adding zero inference-time model calls at deployment. Across six benchmarks, seven target models, and three execution harnesses, SkillOpt is best or tied on all 52 evaluated cells and beats every per-cell competitor. On GPT-5.5 it lifts the average no-skill accuracy by +23.5 points in direct chat, by +24.8 inside Codex, and by +19.1 inside Claude Code.
|
||
|
||
## Key Contributions
|
||
|
||
1. **Text-space optimizer**: First systematic optimizer for agent skills with deep-learning-style controls (learning rate, validation gate, momentum)
|
||
2. **52/52 best/tied**: Across 6 benchmarks × 7 models × 3 harnesses
|
||
3. **Cross-domain transfer**: Skills trained on one model/harness/benchmark transfer positively to others
|
||
4. **Compact artifacts**: 300–2,000 tokens after 1–4 accepted edits
|