20260617:目前有914 页

2026-06-17 15:02:40 +08:00
parent e96b955fda
commit 91fac5b6fc
423 changed files with 20687 additions and 34 deletions
--- a/concepts/drift-detection.md
+++ b/concepts/drift-detection.md
@@ -0,0 +1,49 @@
+---
+title: "漂移检测 (Drift Detection)"
+created: 2026-06-10
+updated: 2026-06-10
+type: concept
+tags: [observability, llm, monitoring, drift]
+sources: [raw/articles/pydantic-three-piece-suite-2026.md]
+---
+
+# 漂移检测 (Drift Detection)
+
+> 监控 LLM 输出结构随时间变化的趋势检测技术——在"第 47 次报错"之前，从"第 32 次开始不对劲"时就看到信号。
+
+## 问题
+
+LLM 输出的 JSON 错误模式是**漂移的**而非稳定的：
+- 第 1 次：字段名少了下划线
+- 第 47 次：多了未定义的字段
+- 第 89 次：str 类型塞了 None
+
+传统 `model_validate` 只能告诉你某次校验失败，不能告诉你**趋势**——哪些字段一直在漂？哪个模型最不稳定？
+
+## 检测维度
+
+- **字段漂移**：哪些字段的校验失败频率在上升？
+- **类型漂移**：哪些字段的类型不匹配越来越频繁？
+- **Token 成本漂移**：输出格式崩塌是否伴随 token 消耗激增？
+- **Tool 调用漂移**：Agent 调用某个 tool 的频率是否异常变化？
+
+## 实现方式
+
+[[logfire|Logfire]] 提供了基于 SQL 查询的漂移检测——不是点按钮过滤，是写 SQL 查 trace：
+
+```sql
+SELECT tool_name, count(*) as calls
+FROM traces
+WHERE time_range = '7d'
+GROUP BY tool_name ORDER BY calls DESC
+```
+
+## 真实案例
+
+Sophos 安全团队：Agent 调用某个 tool 的频率从每 50 次推理 1 次涨到每 8 次 1 次——不是业务量涨了，是 Agent 学"聪明"了。传统日志只报告调用成功，SQL 查询揭示了频率异常。
+
+## 参考
+
+- [[logfire|Logfire]]
+- [[agent-observability|Agent 可观测性]]
+- [[structured-output|结构化输出]]