20260601
This commit is contained in:
29
concepts/agent-completion-evaluation.md
Normal file
29
concepts/agent-completion-evaluation.md
Normal file
@@ -0,0 +1,29 @@
|
||||
---
|
||||
title: "Agent Completion Evaluation(Agent 完成度评测)"
|
||||
created: 2026-05-23
|
||||
updated: 2026-05-23
|
||||
type: concept
|
||||
tags: [agent, evaluation, completion, task-completion]
|
||||
sources: [raw/articles/claw-eval-2026.md]
|
||||
confidence: high
|
||||
---
|
||||
|
||||
# Agent Completion Evaluation
|
||||
|
||||
> Claw-Eval 的 Completion 维度:评测任务是否完成、结果是否符合要求。这是最基本也是最传统的评测维度。
|
||||
|
||||
## 与另两个维度的关系
|
||||
|
||||
| 维度 | 问题 | 失效模式 |
|
||||
|------|------|---------|
|
||||
| Completion | 做完了吗? | 遗漏步骤、结果错误 |
|
||||
| Safety | 做得安全吗? | 违规操作、越权调用 |
|
||||
| Robustness | 出事了能恢复吗? | 遇错崩溃、无法恢复 |
|
||||
|
||||
仅完成度高 ≠ 好 Agent——还需 Safety 和 Robustness 达标。
|
||||
|
||||
## 相关概念
|
||||
|
||||
- [[agent-safety-evaluation]]
|
||||
- [[agent-robustness-evaluation]]
|
||||
- [[claw-eval]]
|
||||
Reference in New Issue
Block a user