20260601
This commit is contained in:
27
raw/papers/agent-harness-engineering-survey-2026.md
Normal file
27
raw/papers/agent-harness-engineering-survey-2026.md
Normal file
@@ -0,0 +1,27 @@
|
||||
---
|
||||
source_url: user-upload
|
||||
ingested: 2026-05-23
|
||||
sha256: unknown
|
||||
---
|
||||
|
||||
# Agent Harness Engineering: A Survey
|
||||
|
||||
## Metadata
|
||||
- **Authors**: Junjie Li^1,6^*, Xi Xiao^6^*, Yunbei Zhang^5^*, Chen Liu^2^*, Lin Zhao^4, Xiaoying Liao^3, Yingrui Ji^6, Janet Wang^6, Jianyang Gu^7, Yingqiang Ge^9, Weijie Xu^9, Xi Fang^9, Xiang Xu^9, Tianchen Zhao^9, Youngeun Kim^9, Tianyang Wang^6, Jihun Hamm^5, Smita Krishnaswamy^2, Jun Huan^9, Chandan K Reddy^8,9
|
||||
- **Institutions**: 1 CMU, 2 Yale, 3 JHU, 4 NEU, 5 Tulane, 6 UAB, 7 OSU, 8 Virginia Tech, 9 Amazon
|
||||
- **Venue**: Under review at TMLR (Transactions on Machine Learning Research), 2026
|
||||
- **Project Page**: Awesome-Agent-Harness
|
||||
|
||||
## Abstract
|
||||
|
||||
The rapid deployment of large language model (LLM) agents in production has revealed a recurring pattern: task execution reliability depends less on the underlying model than on the infrastructure layer that wraps it — the **agent execution harness**. This survey provides a practice-grounded, systematic treatment of agent harness engineering, organized around three claims:
|
||||
|
||||
1. **Binding-Constraint Thesis**: The agent harness is an independent system layer whose engineering quality drives a large share of real-world reliability
|
||||
2. **ETCLOVG Taxonomy**: A seven-layer taxonomy (Execution environment, Tool interface, Context management, Lifecycle/Orchestration, Observability, Verification, Governance)
|
||||
3. **Ecosystem Mapping**: 170+ open-source projects mapped onto this taxonomy
|
||||
|
||||
## Key Contributions
|
||||
|
||||
- Three-phase engineering evolution: Prompt → Context → Harness Engineering
|
||||
- Cross-layer synthesis: Cost-Quality-Speed Trilemma, Capability-Control Tradeoff, Harness Coupling Problem
|
||||
- Open-problem agenda spanning harden/scale execution, maintain reliable state, diagnose from traces, standardize handoffs, and adaptive simplification
|
||||
Reference in New Issue
Block a user