20260625:很多新内容
This commit is contained in:
42
concepts/hybrid-recall-pipeline.md
Normal file
42
concepts/hybrid-recall-pipeline.md
Normal file
@@ -0,0 +1,42 @@
|
||||
---
|
||||
title: "Hybrid Recall Pipeline (BM25 + Dense)"
|
||||
created: 2026-06-24
|
||||
updated: 2026-06-24
|
||||
type: concept
|
||||
tags: ["information-retrieval", "hybrid-search", "bm25", "dense-retrieval", "rrf"]
|
||||
sources:
|
||||
- "[[atlas-agent-memory-architecture-2026]]"
|
||||
---
|
||||
|
||||
# Hybrid Recall Pipeline
|
||||
|
||||
Atlas 记忆系统的混合召回管线:BM25 词法检索 + Dense 语义检索双通路并行,经 RRF 融合和 Cross-encoder 重排序后返回 top-K。
|
||||
|
||||
## 四阶段管线
|
||||
|
||||
1. **Verbatim Pre-Recall**:用户原话不经 LLM 改写,保护精确 token
|
||||
2. **双通路并行检索**:
|
||||
- BM25:multi_match 跨 text/title/name/description/trigger_text,text 权重 2×
|
||||
- Dense:Jina v5 embeddings + ES semantic_text knn
|
||||
3. **RRF 融合**:rank_constant=30(强信号权重),window_size=max(80, k×8)
|
||||
4. **Cross-encoder 重排序**:Jina v2 reranker 逐对评分 top-80 → top-K
|
||||
|
||||
## Ablation 贡献分布
|
||||
|
||||
| 组件 | 贡献 |
|
||||
|------|------|
|
||||
| Dense-only | 0.845 |
|
||||
| BM25-only | 0.708 |
|
||||
| Full (hybrid) | 0.89 |
|
||||
| Reranker (单点) | -0.238 |
|
||||
|
||||
## 关键设计决策
|
||||
|
||||
- **BM25 不能省略**:版本号/错误码/人名等精确 token 只有词法检索能抓到
|
||||
- **Dense 是主力**:语义意图如"数据库偏好"在文档中无直接匹配
|
||||
- **Query expansion 反效果**:BM25+ dense 已覆盖精确和语义,LLM paraphrasing 引入噪音
|
||||
|
||||
## 参考
|
||||
- [[atlas-agent-memory-architecture-2026]]
|
||||
- [[bm25-financial-retrieval]]
|
||||
- [[verbatim-pre-recall]]
|
||||
Reference in New Issue
Block a user