Files
myWiki/raw/papers/yang-skillopt-2026.md
2026-06-01 10:46:01 +08:00

29 lines
2.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "SkillOpt: Executive Strategy for Self-Evolving Agent Skills"
created: 2026-05-29
type: paper-raw
arxiv: "2605.23904"
authors: ["Yifan Yang", "Ziyang Gong", "Weiquan Huang", "Qihao Yang", "Ziwei Zhou", "Zisu Huang", "Yan Li", "Xuemei Gao", "Qi Dai", "Bei Liu", "Kai Qiu", "Yuqing Yang", "Dongdong Chen", "Xue Yang", "Chong Luo"]
venue: "arXiv preprint (cs.AI), v2, May 2026"
affiliation: "Microsoft, Shanghai Jiao Tong University, Tongji University, Fudan University"
tags: ["agent", "skill", "optimization", "text-space", "self-evolving"]
---
# SkillOpt: Executive Strategy for Self-Evolving Agent Skills
**Authors:** Yifan Yang*, Ziyang Gong*, Weiquan Huang*, Qihao Yang*, Ziwei Zhou*, Zisu Huang*, Yan Li, Xuemei Gao, Qi Dai, Bei Liu, Kai Qiu, Yuqing Yang, Dongdong Chen, Xue Yang, Chong Luo (* equal contribution)
**Affiliation:** Microsoft, SJTU, Tongji, Fudan
**arXiv:** [2605.23904](https://arxiv.org/abs/2605.23904) (v2, 25 May 2026)
**Code:** https://aka.ms/SkillOpt
## Abstract
Agent skills today are hand-crafted, generated one-shot, or evolved through loosely controlled self-revision—none of which behaves like a deep-learning optimizer for the skill, and none of which reliably improves over its starting point under feedback. We argue the skill should instead be trained as the external state of a frozen agent. SkillOpt is the first systematic controllable text-space optimizer for agent skills: a separate optimizer model turns scored rollouts into bounded add/delete/replace edits on a single skill document, and an edit is accepted only when it strictly improves a held-out validation score. A textual learning-rate budget, rejected-edit buffer, and epoch-wise slow/meta update make skill training stable while adding zero inference-time model calls at deployment. Across six benchmarks, seven target models, and three execution harnesses, SkillOpt is best or tied on all 52 evaluated cells and beats every per-cell competitor. On GPT-5.5 it lifts the average no-skill accuracy by +23.5 points in direct chat, by +24.8 inside Codex, and by +19.1 inside Claude Code.
## Key Contributions
1. **Text-space optimizer**: First systematic optimizer for agent skills with deep-learning-style controls (learning rate, validation gate, momentum)
2. **52/52 best/tied**: Across 6 benchmarks × 7 models × 3 harnesses
3. **Cross-domain transfer**: Skills trained on one model/harness/benchmark transfer positively to others
4. **Compact artifacts**: 3002,000 tokens after 14 accepted edits