20260601
This commit is contained in:
52
reviews/yang-skillopt-review.md
Normal file
52
reviews/yang-skillopt-review.md
Normal file
@@ -0,0 +1,52 @@
|
||||
---
|
||||
title: "Review: SkillOpt — Agent Skill 的文本空间优化器"
|
||||
created: 2026-05-29
|
||||
type: review
|
||||
paper: "yang-skillopt-2026"
|
||||
arxiv: "2605.23904"
|
||||
---
|
||||
|
||||
# 📌 Review: SkillOpt
|
||||
|
||||
**论文**: SkillOpt: Executive Strategy for Self-Evolving Agent Skills
|
||||
**作者**: Yifan Yang, Ziyang Gong, Weiquan Huang et al. (15 authors)
|
||||
**机构**: Microsoft, SJTU, Tongji, Fudan
|
||||
**arXiv**: 2605.23904 | **领域**: cs.AI | **时间**: 2026-05-29
|
||||
|
||||
---
|
||||
|
||||
## 🎯 核心概念
|
||||
|
||||
1. **[[skillopt|SkillOpt]]** — 首个系统性 Agent Skill 文本空间优化器,52/52 best or tied
|
||||
2. **[[text-space-optimizer|Text-Space Optimizer]]** — 将 skill 训练建模为文本空间优化,与权重空间形成精确类比
|
||||
3. **[[textual-learning-rate|Textual Learning Rate]]** — 编辑预算 L_t 控制优化步长
|
||||
4. **[[held-out-validation-gate|Held-Out Validation Gate]]** — 候选编辑仅在留出集上改善时才被接受
|
||||
5. **[[rejected-edit-buffer|Rejected-Edit Buffer]]** — 失败编辑的负反馈信号,epoch-local
|
||||
6. **[[slow-meta-update|Slow/Meta Update]]** — Momentum 在文本空间的对应:跨 epoch 持久规律
|
||||
7. **[[skill-as-external-state|Skill as External State]]** — 适应不一定要改权重,skill 就是可训练的外部状态
|
||||
|
||||
---
|
||||
|
||||
## 🔗 概念网络
|
||||
|
||||
**核心链**: `skillopt` ↔ `text-space-optimizer` ↔ `textual-learning-rate` ↔ `held-out-validation-gate` ↔ `slow-meta-update`
|
||||
|
||||
**反馈闭环**: `held-out-validation-gate` → `rejected-edit-buffer` → optimizer → `held-out-validation-gate`
|
||||
|
||||
**上层哲学**: `skill-as-external-state` → 连接 `model-harness-relationship` + `heuristic-learning`
|
||||
|
||||
---
|
||||
|
||||
## 📚 Wiki 集成
|
||||
|
||||
- **新增页面**: 10 个(1 raw + 1 paper + 7 概念 + 1 review)
|
||||
- **链接完整性**: 100% 无断链 ✅
|
||||
- **总规模**: 527 → 535 页
|
||||
|
||||
---
|
||||
|
||||
## 💡 关键洞察
|
||||
|
||||
**1. "类比是操作性的,不是装饰性的"**:SkillOpt 最精妙之处是它对深度学习优化器的类比**每个组件都有操作性对应**——learning rate → edit budget、validation → held-out gate、momentum → slow update。这不是比喻,是一个完整翻译过来的优化框架。这在 AI 历史上可能是第一次有人把"优化自然语言 artifact"这件事做得如此系统。
|
||||
|
||||
**2. 从"改参数"到"改文档"的范式转移**:SkillOpt 明确指出 adaptation ≠ weight update。Skill 作为可训练外部状态,与今日已在推进的 `model-harness-relationship`、`heuristic-learning`、`compiled-ai-paradigm` 形成了一条完整的叙事线——AI 的适应正在从模型内部(权重)迁移到模型外部(skill/harness/code),这是一个与本次 GenAI 浪潮本质特征(生成式·通用性·统一性)高度一致的深层趋势。
|
||||
Reference in New Issue
Block a user