myWiki/raw/papers/yang-skillopt-2026.md

---
title: "SkillOpt: Executive Strategy for Self-Evolving Agent Skills"
created: 2026-05-29
type: paper-raw
arxiv: "2605.23904"
authors: ["Yifan Yang", "Ziyang Gong", "Weiquan Huang", "Qihao Yang", "Ziwei Zhou", "Zisu Huang", "Yan Li", "Xuemei Gao", "Qi Dai", "Bei Liu", "Kai Qiu", "Yuqing Yang", "Dongdong Chen", "Xue Yang", "Chong Luo"]
venue: "arXiv preprint (cs.AI), v2, May 2026"
affiliation: "Microsoft, Shanghai Jiao Tong University, Tongji University, Fudan University"
tags: ["agent", "skill", "optimization", "text-space", "self-evolving"]
---

# SkillOpt: Executive Strategy for Self-Evolving Agent Skills

**Authors:** Yifan Yang*, Ziyang Gong*, Weiquan Huang*, Qihao Yang*, Ziwei Zhou*, Zisu Huang*, Yan Li, Xuemei Gao, Qi Dai, Bei Liu, Kai Qiu, Yuqing Yang, Dongdong Chen, Xue Yang, Chong Luo (* equal contribution)
**Affiliation:** Microsoft, SJTU, Tongji, Fudan
**arXiv:** [2605.23904](https://arxiv.org/abs/2605.23904) (v2, 25 May 2026)
**Code:** https://github.com/microsoft/SkillOpt (MIT License, 3.7k stars)

## Abstract

Agent skills today are hand-crafted, generated one-shot, or evolved through loosely controlled self-revision—none of which behaves like a deep-learning optimizer for the skill, and none of which reliably improves over its starting point under feedback. We argue the skill should instead be trained as the external state of a frozen agent. SkillOpt is the first systematic controllable text-space optimizer for agent skills: a separate optimizer model turns scored rollouts into bounded add/delete/replace edits on a single skill document, and an edit is accepted only when it strictly improves a held-out validation score. A textual learning-rate budget, rejected-edit buffer, and epoch-wise slow/meta update make skill training stable while adding zero inference-time model calls at deployment. Across six benchmarks, seven target models, and three execution harnesses, SkillOpt is best or tied on all 52 evaluated cells and beats every per-cell competitor. On GPT-5.5 it lifts the average no-skill accuracy by +23.5 points in direct chat, by +24.8 inside Codex, and by +19.1 inside Claude Code.

## Key Contributions

1. **Text-space optimizer**: First systematic optimizer for agent skills with deep-learning-style controls (learning rate, validation gate, momentum)
2. **52/52 best/tied**: Across 6 benchmarks × 7 models × 3 harnesses
3. **Cross-domain transfer**: Skills trained on one model/harness/benchmark transfer positively to others
4. **Compact artifacts**: 300–2,000 tokens after 1–4 accepted edits