20260617:目前有914 页
This commit is contained in:
57
raw/papers/ma-intragent-2026.md
Normal file
57
raw/papers/ma-intragent-2026.md
Normal file
@@ -0,0 +1,57 @@
|
||||
---
|
||||
title: "IntrAgent: An LLM Agent for Content-Grounded Information Retrieval through Literature Review"
|
||||
type: raw-paper
|
||||
arxiv: "2604.22861"
|
||||
year: 2026
|
||||
authors: "Fengbo Ma, Zixin Rao, Xiaoting Li, Zhetao Chen, Hongyue Sun, Yiping Zhao, Xianyan Chen, Zhen Xiang"
|
||||
venue: "arXiv 2026"
|
||||
code: "https://github.com/FengboMa/IntrAgent"
|
||||
dataset: "https://huggingface.co/datasets/IntrAgent/IntraBench"
|
||||
project: "https://intragent.github.io/"
|
||||
---
|
||||
|
||||
# IntrAgent: An LLM Agent for Content-Grounded Information Retrieval through Literature Review
|
||||
|
||||
**Authors:** Fengbo Ma*, Zixin Rao*, Xiaoting Li, Zhetao Chen, Hongyue Sun, Yiping Zhao, Xianyan Chen†, Zhen Xiang†
|
||||
**Affiliation:** University of Georgia, Athens, GA, USA
|
||||
**arXiv:** 2604.22861
|
||||
**Date:** April 23, 2026
|
||||
|
||||
## Abstract
|
||||
|
||||
Scientific research relies on accurate information retrieval from literature to support analytical decisions. In this work, we introduce a new task, INformation reTRieval through literAture reVIEW (IntraView), which aims to automate fine-grained information retrieval faithfully grounded in the provided content in response to research-driven queries, and propose IntrAgent, an LLM-based agent that addresses this challenging task. In particular, IntrAgent is designed to mimic human behaviors when reading literature for information retrieval – identifying relevant sections and then iteratively extracting key details to refine the retrieved information. It follows a two-stage pipeline: a Section Ranking stage that prioritizes relevant literature sections through structural-knowledge-enabled reasoning, and an Iterative Reading stage that continuously extracts details and synthesizes them into concise, contextually grounded answers. To support rigorous evaluation, we introduce IntraBench, a new benchmark consisting of 315 test instances built from expert-authored questions paired with literature spanning five STEM domains. Across seven backbone LLMs, IntrAgent achieves on average 13.2% higher cross-domain accuracy than state-of-the-art RAG and research-agent baselines.
|
||||
|
||||
## Key Contributions
|
||||
|
||||
1. **IntraView Task** — A novel task for accurate, automated, and content-grounded information retrieval from a provided scientific literature.
|
||||
2. **IntrAgent Framework** — An LLM agent with a two-stage pipeline (Section Ranking + Iterative Reading) that mimics human reading behavior.
|
||||
3. **Hierarchy Preservation** — Leverages structural knowledge of scientific documents for more effective section ranking.
|
||||
4. **Sufficiency Check** — Mitigates hallucination by explicitly assessing whether accumulated information is adequate to answer the query.
|
||||
5. **IntraBench** — The first benchmark for evaluating IntraView, with 315 test instances across five domains (physics, earth science, public health, engineering, material science).
|
||||
|
||||
## Method Overview
|
||||
|
||||
### Section Ranking
|
||||
1. **Section Heading Parsing**: Convert literature to Markdown with minerU for layout/section detection.
|
||||
2. **Hierarchy Preservation**: Construct a section tree from headings using LLM-based hierarchy inference.
|
||||
3. **Reasoning-Based Ranking**: LLM ranks sections by relevance to the research question via structure-aware reasoning.
|
||||
|
||||
### Iterative Reading
|
||||
- **Reordered Section Access**: Read sections in descending relevance order.
|
||||
- **Section Detail Extraction**: Extract key scientific details (terminology, numbers, experiments, statistics, conclusions).
|
||||
- **Information Sufficiency Check**: LLM evaluates whether accumulated details are sufficient; terminates or continues reading.
|
||||
- **Confidence-Based Reading Styles**: Conservative, balanced (default), and aggressive modes to control operational overhead.
|
||||
- **Final Answer Synthesis**: Synthesize answer from all accumulated details.
|
||||
|
||||
## Evaluation
|
||||
|
||||
- **IntraBench**: 315 test instances across physics, earth science, public health, engineering, material science.
|
||||
- **LLM-Grounded Multiple-Choice Evaluation**: LLM maps generated free-form answers to multiple-choice candidates, addressing synonym/abbreviation challenges.
|
||||
- **Baselines**: RAG systems (vanilla RAG, re-ranking, contextual retrieval) and literature agents (PaperQA2, QASA, SciMaster).
|
||||
- **Results**: 13.2% average cross-domain accuracy improvement over baselines across 7 backbone LLMs.
|
||||
|
||||
## Key Design Insights
|
||||
|
||||
- Structural knowledge (section hierarchy) is critical for accurate section ranking — semantic similarity alone insufficient.
|
||||
- Sufficiency check prevents both hallucination (premature answer with insufficient evidence) and over-reading.
|
||||
- The framework can handle queries where the answer is NOT present in the literature (through explicit "None of the above" handling).
|
||||
Reference in New Issue
Block a user