20260601

2026-06-01 10:46:01 +08:00
parent 2faf4bb002
commit e96b955fda
221 changed files with 10219 additions and 332 deletions
--- a/raw/papers/yang-skillopt-2026.md
+++ b/raw/papers/yang-skillopt-2026.md
@@ -0,0 +1,28 @@
+---
+title: "SkillOpt: Executive Strategy for Self-Evolving Agent Skills"
+created: 2026-05-29
+type: paper-raw
+arxiv: "2605.23904"
+authors: ["Yifan Yang", "Ziyang Gong", "Weiquan Huang", "Qihao Yang", "Ziwei Zhou", "Zisu Huang", "Yan Li", "Xuemei Gao", "Qi Dai", "Bei Liu", "Kai Qiu", "Yuqing Yang", "Dongdong Chen", "Xue Yang", "Chong Luo"]
+venue: "arXiv preprint (cs.AI), v2, May 2026"
+affiliation: "Microsoft, Shanghai Jiao Tong University, Tongji University, Fudan University"
+tags: ["agent", "skill", "optimization", "text-space", "self-evolving"]
+---
+
+# SkillOpt: Executive Strategy for Self-Evolving Agent Skills
+
+**Authors:** Yifan Yang*, Ziyang Gong*, Weiquan Huang*, Qihao Yang*, Ziwei Zhou*, Zisu Huang*, Yan Li, Xuemei Gao, Qi Dai, Bei Liu, Kai Qiu, Yuqing Yang, Dongdong Chen, Xue Yang, Chong Luo (* equal contribution)
+**Affiliation:** Microsoft, SJTU, Tongji, Fudan
+**arXiv:** [2605.23904](https://arxiv.org/abs/2605.23904) (v2, 25 May 2026)
+**Code:** https://aka.ms/SkillOpt
+
+## Abstract
+
+Agent skills today are hand-crafted, generated one-shot, or evolved through loosely controlled self-revision—none of which behaves like a deep-learning optimizer for the skill, and none of which reliably improves over its starting point under feedback. We argue the skill should instead be trained as the external state of a frozen agent. SkillOpt is the first systematic controllable text-space optimizer for agent skills: a separate optimizer model turns scored rollouts into bounded add/delete/replace edits on a single skill document, and an edit is accepted only when it strictly improves a held-out validation score. A textual learning-rate budget, rejected-edit buffer, and epoch-wise slow/meta update make skill training stable while adding zero inference-time model calls at deployment. Across six benchmarks, seven target models, and three execution harnesses, SkillOpt is best or tied on all 52 evaluated cells and beats every per-cell competitor. On GPT-5.5 it lifts the average no-skill accuracy by +23.5 points in direct chat, by +24.8 inside Codex, and by +19.1 inside Claude Code.
+
+## Key Contributions
+
+1. **Text-space optimizer**: First systematic optimizer for agent skills with deep-learning-style controls (learning rate, validation gate, momentum)
+2. **52/52 best/tied**: Across 6 benchmarks × 7 models × 3 harnesses
+3. **Cross-domain transfer**: Skills trained on one model/harness/benchmark transfer positively to others
+4. **Compact artifacts**: 300–2,000 tokens after 1–4 accepted edits