Files
myWiki/concepts/observability.md
2026-06-01 10:46:01 +08:00

40 lines
1.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Observability & Operations可观测性与运维"
created: 2026-05-23
updated: 2026-05-23
type: concept
tags: [agent, observability, monitoring, ops, tracing]
sources: [raw/papers/agent-harness-engineering-survey-2026.md]
confidence: high
---
# Observability & OperationsO 层)
> ETCLOVG 的 O 层:捕获追踪、成本、故障和可靠性信号。在 ETCLOVG 中被提升为独立架构层。
## 为什么独立成层?
生产系统中可观测性已有专属工具生态和独立工程实践:
- **追踪和监控平台**Langfuse, Arize Phoenix, AgentOps
- **Agent 专用运维平台**AgentTrace, OpenLLMetry
- **成本追踪和优化**TensorZero, Axon
- **可靠性工程**:异常检测、故障恢复
## 数据揭示的 Gap
LangChain 2026 调查89% 的团队使用可观测性,但只有 52.4% 运行离线评估。这意味着团队能**看到** Agent 做了什么,但不能系统性地判断行为是否正确。
## 闭合回路
未来可观测性需要与 [[verification-evaluation]] 层紧密耦合:
- 将异常生产踪迹转化为回归案例
- 直接从 spans 计算轨迹质量指标
- 将诊断信号反馈到 prompt、tool、context 和编排变更
## 相关概念
- [[etclovg-taxonomy]]
- [[trace-native-evaluation]] — 从踪迹中评估
- [[cost-quality-speed-trilemma]] — O 层的投入与质量/速度的权衡
- [[agent-harness-engineering-survey]]