20260601
This commit is contained in:
23
concepts/swe-bench.md
Normal file
23
concepts/swe-bench.md
Normal file
@@ -0,0 +1,23 @@
|
||||
---
|
||||
title: "SWE-bench"
|
||||
created: 2026-05-26
|
||||
type: concept
|
||||
tags: ["benchmark", "coding-agent", "software-engineering"]
|
||||
sources: ["mini-agent-harness"]
|
||||
---
|
||||
|
||||
# SWE-bench
|
||||
|
||||
> 软件工程任务的 Agent 评测基准:真实 GitHub issue → patch 生成 → 环境测试。
|
||||
|
||||
## 评测流程
|
||||
|
||||
1. 给定一个真实 issue
|
||||
2. Agent 生成 patch
|
||||
3. 将 patch 放入环境运行测试
|
||||
4. Harness 负责准备环境、应用 patch、执行测试、汇总结果
|
||||
|
||||
## 相关页面
|
||||
|
||||
- [[terminal-bench]] — 终端环境评测
|
||||
- [[agent-harness-mini]] — 最小化评测框架
|
||||
Reference in New Issue
Block a user