41 lines
2.0 KiB
Markdown
41 lines
2.0 KiB
Markdown
---
|
|
title: "When Large Multimodal Models Confront Evolving Knowledge: Challenges and Explorations"
|
|
authors:
|
|
- Kailin Jiang
|
|
- Yuntao Du
|
|
- Yukai Ding
|
|
- Yuchen Ren
|
|
- Zhi Gao
|
|
- Zilong Zheng
|
|
- Ning Jiang
|
|
- Lei Liu
|
|
- Bin Li
|
|
- Qing Li
|
|
date: 2026
|
|
arxiv: "2505.24449"
|
|
venue: "ICLR 2026"
|
|
domain: "Multimodal Learning, Knowledge Injection, Continual Learning"
|
|
type: paper
|
|
source: "https://arxiv.org/abs/2505.24449"
|
|
---
|
|
|
|
# When Large Multimodal Models Confront Evolving Knowledge: Challenges and Explorations
|
|
|
|
**Authors**: Kailin Jiang, Yuntao Du, Yukai Ding, Yuchen Ren, Zhi Gao, Zilong Zheng, Ning Jiang, Lei Liu, Bin Li, Qing Li
|
|
|
|
**Venue**: ICLR 2026
|
|
|
|
**arXiv**: 2505.24449
|
|
|
|
## Abstract
|
|
|
|
Large Multimodal Models (LMMs) store vast amounts of pretrained knowledge but struggle to remain aligned with real-world updates, making it difficult to avoid capability degradation when acquiring evolving knowledge. Furthermore, most current work focuses on exploring static textual knowledge injection, neglecting dynamic multimodal evolving knowledge injection. To address this, the authors propose MME VOKE, a benchmark for evaluating LMMs' ability in multimodal evolving knowledge injection, containing 9,422 samples spanning 159 subtypes. Through extensive experiments, they reveal challenges such as poor injection performance and capability degradation, and introduce knowledge augmentation and knowledge retention methods to address these challenges.
|
|
|
|
## Key Contributions
|
|
|
|
1. **MMEVOKE Benchmark**: First multimodal evolving knowledge injection benchmark with self-evolving data construction pipeline
|
|
2. **Dual Challenge Identification**: Poor knowledge adaptation AND capability degradation after injection
|
|
3. **Knowledge-Aware Augmentation**: Demonstrates semantic augmentation strengthens adaptation while surface-level augmentation is detrimental
|
|
4. **Retention Methods**: Data Replay and MoELoRA effectively mitigate degradation; EWC/LwF fail
|
|
5. **Sufficient Context Paradox**: Even with all necessary information, LMMs still produce incorrect answers
|