20260601
This commit is contained in:
31
concepts/terminal-bench.md
Normal file
31
concepts/terminal-bench.md
Normal file
@@ -0,0 +1,31 @@
|
||||
---
|
||||
title: "Terminal-Bench"
|
||||
created: 2026-05-26
|
||||
type: concept
|
||||
tags: ["benchmark", "agent-evaluation", "terminal", "coding"]
|
||||
sources: ["mini-agent-harness"]
|
||||
---
|
||||
|
||||
# Terminal-Bench
|
||||
|
||||
> 终端环境下的 Agent 评测基准:将模型接入终端,执行命令、安装依赖、调试错误,用测试脚本验证。
|
||||
|
||||
## 任务结构
|
||||
|
||||
- **Instruction**:任务指令
|
||||
- **Isolated Environment**:隔离执行环境
|
||||
- **Test Script**:验证脚本
|
||||
|
||||
## 与 [[swe-bench]] 的区别
|
||||
|
||||
| 维度 | Terminal-Bench | SWE-bench |
|
||||
|------|---------------|-----------|
|
||||
| 环境 | 裸终端 | Git 仓库 |
|
||||
| 任务 | 命令行操作 | Patch 生成 |
|
||||
| 验证 | 测试脚本 | 单元测试 |
|
||||
| 适用场景 | 系统运维/DevOps | 代码修复 |
|
||||
|
||||
## 相关页面
|
||||
|
||||
- [[agent-computer-interface]] — 终端即 ACI
|
||||
- [[agent-harness-mini]] — 可参考其任务结构
|
||||
Reference in New Issue
Block a user