Files
myWiki/concepts/agent-completion-evaluation.md
2026-06-01 10:46:01 +08:00

30 lines
875 B
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Agent Completion EvaluationAgent 完成度评测)"
created: 2026-05-23
updated: 2026-05-23
type: concept
tags: [agent, evaluation, completion, task-completion]
sources: [raw/articles/claw-eval-2026.md]
confidence: high
---
# Agent Completion Evaluation
> Claw-Eval 的 Completion 维度:评测任务是否完成、结果是否符合要求。这是最基本也是最传统的评测维度。
## 与另两个维度的关系
| 维度 | 问题 | 失效模式 |
|------|------|---------|
| Completion | 做完了吗? | 遗漏步骤、结果错误 |
| Safety | 做得安全吗? | 违规操作、越权调用 |
| Robustness | 出事了能恢复吗? | 遇错崩溃、无法恢复 |
仅完成度高 ≠ 好 Agent——还需 Safety 和 Robustness 达标。
## 相关概念
- [[agent-safety-evaluation]]
- [[agent-robustness-evaluation]]
- [[claw-eval]]