20260601
This commit is contained in:
33
concepts/multimodal-rag.md
Normal file
33
concepts/multimodal-rag.md
Normal file
@@ -0,0 +1,33 @@
|
||||
---
|
||||
title: "多模态 RAG (Multimodal RAG)"
|
||||
created: 2026-05-21
|
||||
type: concept
|
||||
tags: ["rag", "multimodal", "retrieval"]
|
||||
sources: ["[[when-large-multimodal-models-confront-evolving-knowledge]]"]
|
||||
---
|
||||
|
||||
# 多模态 RAG (Multimodal RAG)
|
||||
|
||||
## 定义
|
||||
|
||||
多模态 RAG(MM-RAG)将[[rag|检索增强生成]]扩展到多模态场景,通过检索外部多模态知识来增强 LMM 的知识密集型任务表现。
|
||||
|
||||
## 三种检索策略
|
||||
|
||||
| 策略 | 检索依据 | LLaVA-v1.5 CEM | Qwen-VL-Chat CEM |
|
||||
|------|---------|---------------|-----------------|
|
||||
| Text-Only | 仅文本特征 | 24.05% | 21.79% |
|
||||
| Image-Only | 仅视觉特征 | 25.25% | 22.31% |
|
||||
| UniIR | 多模态特征融合 | **40.68%** | **32.75%** |
|
||||
|
||||
## 关键发现
|
||||
|
||||
1. MM-RAG 优于 SFT(Full-FT/LoRA),但最高仅 40.68% CEM——**远未达到理想水平**
|
||||
2. UniIR 融合多模态特征检索显著优于单模态检索
|
||||
3. 即使提供了充分上下文(Sufficient Context),模型仍不能完美回答——揭示了**利用能力**而非**检索能力**是瓶颈
|
||||
|
||||
## 参见
|
||||
|
||||
- [[rag|RAG]]
|
||||
- [[sufficient-context-paradox|充分上下文悖论]]
|
||||
- [[evolving-knowledge-injection|进化知识注入]]
|
||||
Reference in New Issue
Block a user