SidneyZhang/myWiki

Fork 0

Files

Sidney Zhang 91fac5b6fc

20260617:目前有914 页

2026-06-17 15:02:40 +08:00

1.9 KiB

Raw Blame History

source_url, ingested, sha256

source_url	ingested	sha256
https://arxiv.org/abs/2605.22166	2026-06-11	placeholder

Adapting the Interface, Not the Model: Runtime Harness Adaptation for Deterministic LLM Agents

Authors: Tianshi Xu†, Huifeng Wen†, Meng Li (Peking University) — †Equal contribution arXiv: 2605.22166v2 [cs.AI] — May 2026 Code: https://github.com/Tianshi-Xu/Life-Harness

Abstract

LLM agents are shaped not only by their language models, but also by the runtime harness that mediates observation, tool use, action execution, feedback interpretation, and trajectory control. While existing agent adaptation methods mainly update model parameters, many failures in deterministic, rule-governed domains stem from mismatches at the model–environment interface. We propose Life-Harness, a lifecycle-aware runtime harness that improves frozen LLM agents without changing model weights or evaluation environments. Life-Harness evolves from training trajectories by converting recurring interaction failures into reusable interventions across environment contracts, procedural skills, action realization, and trajectory regulation, and remains fixed for evaluation on unseen tasks. On seven deterministic environments from τ-bench, τ²-bench, and AgentBench, Life-Harness improves 116 out of 126 model–environment settings across 18 model backbones, with an average relative improvement of 88.5%. Harnesses evolved only from Qwen3-4B-Instruct trajectories transfer to 17 other models, showing that Life-Harness captures reusable environment-side structure rather than model-specific behavior.

Key Contributions

Formulation of harness-based runtime interface adaptation for deterministic LLM agents
Life-Harness: a lifecycle-aware framework with four intervention layers
Cross-model transfer: harnesses evolved on one model (Qwen3-4B) generalize to 17 others
Complementary to model training: enables Qwen2.5-32B to outperform its tool-use-trained derivative

1.9 KiB Raw Blame History Unescape Escape

Adapting the Interface, Not the Model: Runtime Harness Adaptation for Deterministic LLM Agents

Abstract

Key Contributions

1.9 KiB

Raw Blame History