Auditing Agent Harness Safety

Authors: Chengzhi Liu*, Yichen Guo*, Yepeng Liu, Yuzhe Yang, Qianqi Yan, Xuandong Zhao, Wenyue Hua, Sheng Liu, Sharon Li, Yuheng Bu, Xin Eric Wang
Affiliations: UC Santa Barbara, UC Berkeley, Stanford University, UW–Madison, Microsoft Research
arXiv: 2605.14271 (v2, May 2026)
Venue: cs.CL
Project Page: harnessaudit.github.io

Abstract

LLM agents increasingly run inside execution harnesses that dispatch tools, allocate resources, and route messages between specialized components. However, a harness can return a correct, benign answer over a trajectory that accesses unauthorized resources or leaks context to the wrong agent. Output-level evaluation cannot see these failures, yet most safety benchmarks score only final outputs or terminal states, even though many violations occur mid-trajectory rather than at termination. The central question is whether the harness respects user intent, permission boundaries, and information-flow constraints throughout execution. To address this gap, we propose HarnessAudit, a framework that audits full execution trajectories across boundary compliance, execution fidelity, and system stability, with a focus on multi-agent harnesses where these risks are most pronounced. We further introduce HarnessAudit-Bench, a benchmark of 210 tasks across eight real-world domains, instantiated in both single-agent and multi-agent configurations with embedded safety constraints. Evaluating ten harness configurations across frontier models and three multi-agent frameworks, we find that: (i) task completion is misaligned with safe execution, and violations accumulate with trajectory length; (ii) safety risks vary across domains, task types, and agent roles; (iii) most violations concentrate in resource access and inter-agent information transfer; (iv) multi-agent collaboration expands the safety risk surface, while harness design sets the upper bound of safe deployment.

Key Concepts

agent-harness-safety — the core paradigm
harnessaudit — the auditing framework
boundary-compliance — L1: tool, resource, information-flow violations
execution-fidelity — L2: action validity, checkpointed completion
system-stability — L3: perturbation resilience
trajectory-auditing — trajectory-level evidence collection
multi-agent-safety — multi-agent coordination safety risks
information-flow-control — inter-agent communication constraints
resource-access-control — resource scope enforcement
safety-adherence-rate — SAR scoring metric
policy-constrained-execution — formal harness model
execution-harness — harness as policy-constrained execution system
hidden-audit-channel — agent-independent evidence recording

2.9 KiB Raw Blame History Unescape Escape

Auditing Agent Harness Safety

Abstract

Key Concepts

2.9 KiB

Raw Blame History