40 lines
1.4 KiB
Markdown
40 lines
1.4 KiB
Markdown
---
|
||
title: "Observability & Operations(可观测性与运维)"
|
||
created: 2026-05-23
|
||
updated: 2026-05-23
|
||
type: concept
|
||
tags: [agent, observability, monitoring, ops, tracing]
|
||
sources: [raw/papers/agent-harness-engineering-survey-2026.md]
|
||
confidence: high
|
||
---
|
||
|
||
# Observability & Operations(O 层)
|
||
|
||
> ETCLOVG 的 O 层:捕获追踪、成本、故障和可靠性信号。在 ETCLOVG 中被提升为独立架构层。
|
||
|
||
## 为什么独立成层?
|
||
|
||
生产系统中可观测性已有专属工具生态和独立工程实践:
|
||
- **追踪和监控平台**:Langfuse, Arize Phoenix, AgentOps
|
||
- **Agent 专用运维平台**:AgentTrace, OpenLLMetry
|
||
- **成本追踪和优化**:TensorZero, Axon
|
||
- **可靠性工程**:异常检测、故障恢复
|
||
|
||
## 数据揭示的 Gap
|
||
|
||
LangChain 2026 调查:89% 的团队使用可观测性,但只有 52.4% 运行离线评估。这意味着团队能**看到** Agent 做了什么,但不能系统性地判断行为是否正确。
|
||
|
||
## 闭合回路
|
||
|
||
未来可观测性需要与 [[verification-evaluation]] 层紧密耦合:
|
||
- 将异常生产踪迹转化为回归案例
|
||
- 直接从 spans 计算轨迹质量指标
|
||
- 将诊断信号反馈到 prompt、tool、context 和编排变更
|
||
|
||
## 相关概念
|
||
|
||
- [[etclovg-taxonomy]]
|
||
- [[trace-native-evaluation]] — 从踪迹中评估
|
||
- [[cost-quality-speed-trilemma]] — O 层的投入与质量/速度的权衡
|
||
- [[agent-harness-engineering-survey]]
|