20260601
This commit is contained in:
69
papers/agent-harness-engineering-survey.md
Normal file
69
papers/agent-harness-engineering-survey.md
Normal file
@@ -0,0 +1,69 @@
|
||||
---
|
||||
title: "Agent Harness Engineering: A Survey"
|
||||
created: 2026-05-23
|
||||
updated: 2026-05-23
|
||||
type: paper
|
||||
tags: [agent, infrastructure, harness, taxonomy, survey, production]
|
||||
sources: [raw/papers/agent-harness-engineering-survey-2026.md]
|
||||
confidence: high
|
||||
---
|
||||
|
||||
# Agent Harness Engineering: A Survey
|
||||
|
||||
> **核心论点**:LLM Agent 在生产环境中的可靠性瓶颈不在模型本身,而在包裹模型的**基础设施层——Agent Execution Harness**。
|
||||
|
||||
## 基本信息
|
||||
|
||||
- **作者**: Junjie Li, Xi Xiao, Yunbei Zhang, Chen Liu 等(CMU × Yale × JHU × NEU × Tulane × UAB × OSU × Virginia Tech × Amazon)
|
||||
- **投稿**: TMLR (Transactions on Machine Learning Research), 2026
|
||||
- **项目页**: Awesome-Agent-Harness
|
||||
- **规模**: 51 页,170+ 开源项目映射
|
||||
|
||||
## 三大贡献
|
||||
|
||||
### 1. 约束瓶颈论(Binding-Constraint Thesis)
|
||||
|
||||
Agent 的可靠性不取决于模型,而取决于 Harness 的工程质量。论文通过三阶段工程演进(Prompt → Context → Harness)、跨层综合分析(三元悖论、能力-控制权衡、耦合问题)和开放问题议程来支撑这一论点。
|
||||
|
||||
详细讨论:[[binding-constraint-thesis]]
|
||||
|
||||
### 2. ETCLOVG 七层分类法
|
||||
|
||||
将 Agent Harness 拆分为七个独立架构层:
|
||||
- **E**xecution Environment(执行环境)——沙箱、容器、浏览器环境
|
||||
- **T**ool Interface(工具接口)——工具描述、发现、调用、MCP 协议
|
||||
- **C**ontext Management(上下文管理)——短/中/长期记忆、上下文漂移
|
||||
- **L**ifecycle/Orchestration(生命周期编排)——单 Agent 循环、多 Agent 协调
|
||||
- **O**bservability(可观测性)——追踪、成本、可靠性信号
|
||||
- **V**erification(验证评估)——任务评估、失败归因、回归反馈
|
||||
- **G**overnance(治理安全)——权限、身份、审计、人机协同
|
||||
|
||||
详细讨论:[[etclovg-taxonomy]]
|
||||
|
||||
### 3. 生态系统映射
|
||||
|
||||
对 170+ 开源项目按 ETCLOVG 分类,揭示采用模式、覆盖缺口和新兴设计原则。
|
||||
|
||||
## 跨层综合(Cross-Layer Synthesis)
|
||||
|
||||
- **[[cost-quality-speed-trilemma]]**:成本、质量、速度三者不可兼得,需要在不同代理生命周期阶段做权衡
|
||||
- **[[capability-control-tradeoff]]**:更强的 Harness 给 Agent 更多能力,但每次能力扩展都增大控制问题
|
||||
- **[[harness-coupling-problem]]**:Harness 各层高度耦合,局部优化可能破坏全局——应作为**控制系统**来测试
|
||||
|
||||
## 五大开放问题
|
||||
|
||||
1. [[hardening-execution-environments]] — 硬化与扩展执行环境
|
||||
2. [[reliable-state-long-running-agents]] — 长时间运行 Agent 中的可靠状态维护
|
||||
3. [[trace-native-evaluation]] — 从 Agent 踪迹中诊断失败
|
||||
4. [[standard-agent-handoffs]] — Agent、工具、人类之间的标准化交接
|
||||
5. [[adaptive-harness-simplification]] — 在模型能力提升时保持 Harness 有用性
|
||||
|
||||
## 三阶段工程演进
|
||||
|
||||
[[prompt-to-harness-evolution]] 描述了从 Prompt Engineering → Context Engineering → Harness Engineering 的三个阶段,每一阶段都在前一阶段基础上扩展,约束瓶颈逐步上移。
|
||||
|
||||
## 关键引用
|
||||
|
||||
- Bölük (2026a): "只改变了 harness,15 个 LLM 的编程能力同时提升"
|
||||
- Anthropic (2026a): "基础设施设置可以可测量地改变 benchmark 分数"
|
||||
- OpenAI (2026): "Harness engineering 是保持人类注意力、仓库状态和 Agent 执行对齐的学科"
|
||||
Reference in New Issue
Block a user