20260601
This commit is contained in:
28
raw/papers/yang-skillopt-2026.md
Normal file
28
raw/papers/yang-skillopt-2026.md
Normal file
@@ -0,0 +1,28 @@
|
||||
---
|
||||
title: "SkillOpt: Executive Strategy for Self-Evolving Agent Skills"
|
||||
created: 2026-05-29
|
||||
type: paper-raw
|
||||
arxiv: "2605.23904"
|
||||
authors: ["Yifan Yang", "Ziyang Gong", "Weiquan Huang", "Qihao Yang", "Ziwei Zhou", "Zisu Huang", "Yan Li", "Xuemei Gao", "Qi Dai", "Bei Liu", "Kai Qiu", "Yuqing Yang", "Dongdong Chen", "Xue Yang", "Chong Luo"]
|
||||
venue: "arXiv preprint (cs.AI), v2, May 2026"
|
||||
affiliation: "Microsoft, Shanghai Jiao Tong University, Tongji University, Fudan University"
|
||||
tags: ["agent", "skill", "optimization", "text-space", "self-evolving"]
|
||||
---
|
||||
|
||||
# SkillOpt: Executive Strategy for Self-Evolving Agent Skills
|
||||
|
||||
**Authors:** Yifan Yang*, Ziyang Gong*, Weiquan Huang*, Qihao Yang*, Ziwei Zhou*, Zisu Huang*, Yan Li, Xuemei Gao, Qi Dai, Bei Liu, Kai Qiu, Yuqing Yang, Dongdong Chen, Xue Yang, Chong Luo (* equal contribution)
|
||||
**Affiliation:** Microsoft, SJTU, Tongji, Fudan
|
||||
**arXiv:** [2605.23904](https://arxiv.org/abs/2605.23904) (v2, 25 May 2026)
|
||||
**Code:** https://aka.ms/SkillOpt
|
||||
|
||||
## Abstract
|
||||
|
||||
Agent skills today are hand-crafted, generated one-shot, or evolved through loosely controlled self-revision—none of which behaves like a deep-learning optimizer for the skill, and none of which reliably improves over its starting point under feedback. We argue the skill should instead be trained as the external state of a frozen agent. SkillOpt is the first systematic controllable text-space optimizer for agent skills: a separate optimizer model turns scored rollouts into bounded add/delete/replace edits on a single skill document, and an edit is accepted only when it strictly improves a held-out validation score. A textual learning-rate budget, rejected-edit buffer, and epoch-wise slow/meta update make skill training stable while adding zero inference-time model calls at deployment. Across six benchmarks, seven target models, and three execution harnesses, SkillOpt is best or tied on all 52 evaluated cells and beats every per-cell competitor. On GPT-5.5 it lifts the average no-skill accuracy by +23.5 points in direct chat, by +24.8 inside Codex, and by +19.1 inside Claude Code.
|
||||
|
||||
## Key Contributions
|
||||
|
||||
1. **Text-space optimizer**: First systematic optimizer for agent skills with deep-learning-style controls (learning rate, validation gate, momentum)
|
||||
2. **52/52 best/tied**: Across 6 benchmarks × 7 models × 3 harnesses
|
||||
3. **Cross-domain transfer**: Skills trained on one model/harness/benchmark transfer positively to others
|
||||
4. **Compact artifacts**: 300–2,000 tokens after 1–4 accepted edits
|
||||
Reference in New Issue
Block a user