Files
myWiki/raw/papers/ma-intragent-2026.md

4.6 KiB
Raw Blame History

title, type, arxiv, year, authors, venue, code, dataset, project
title type arxiv year authors venue code dataset project
IntrAgent: An LLM Agent for Content-Grounded Information Retrieval through Literature Review raw-paper 2604.22861 2026 Fengbo Ma, Zixin Rao, Xiaoting Li, Zhetao Chen, Hongyue Sun, Yiping Zhao, Xianyan Chen, Zhen Xiang arXiv 2026 https://github.com/FengboMa/IntrAgent https://huggingface.co/datasets/IntrAgent/IntraBench https://intragent.github.io/

IntrAgent: An LLM Agent for Content-Grounded Information Retrieval through Literature Review

Authors: Fengbo Ma*, Zixin Rao*, Xiaoting Li, Zhetao Chen, Hongyue Sun, Yiping Zhao, Xianyan Chen†, Zhen Xiang† Affiliation: University of Georgia, Athens, GA, USA arXiv: 2604.22861 Date: April 23, 2026

Abstract

Scientific research relies on accurate information retrieval from literature to support analytical decisions. In this work, we introduce a new task, INformation reTRieval through literAture reVIEW (IntraView), which aims to automate fine-grained information retrieval faithfully grounded in the provided content in response to research-driven queries, and propose IntrAgent, an LLM-based agent that addresses this challenging task. In particular, IntrAgent is designed to mimic human behaviors when reading literature for information retrieval identifying relevant sections and then iteratively extracting key details to refine the retrieved information. It follows a two-stage pipeline: a Section Ranking stage that prioritizes relevant literature sections through structural-knowledge-enabled reasoning, and an Iterative Reading stage that continuously extracts details and synthesizes them into concise, contextually grounded answers. To support rigorous evaluation, we introduce IntraBench, a new benchmark consisting of 315 test instances built from expert-authored questions paired with literature spanning five STEM domains. Across seven backbone LLMs, IntrAgent achieves on average 13.2% higher cross-domain accuracy than state-of-the-art RAG and research-agent baselines.

Key Contributions

  1. IntraView Task — A novel task for accurate, automated, and content-grounded information retrieval from a provided scientific literature.
  2. IntrAgent Framework — An LLM agent with a two-stage pipeline (Section Ranking + Iterative Reading) that mimics human reading behavior.
  3. Hierarchy Preservation — Leverages structural knowledge of scientific documents for more effective section ranking.
  4. Sufficiency Check — Mitigates hallucination by explicitly assessing whether accumulated information is adequate to answer the query.
  5. IntraBench — The first benchmark for evaluating IntraView, with 315 test instances across five domains (physics, earth science, public health, engineering, material science).

Method Overview

Section Ranking

  1. Section Heading Parsing: Convert literature to Markdown with minerU for layout/section detection.
  2. Hierarchy Preservation: Construct a section tree from headings using LLM-based hierarchy inference.
  3. Reasoning-Based Ranking: LLM ranks sections by relevance to the research question via structure-aware reasoning.

Iterative Reading

  • Reordered Section Access: Read sections in descending relevance order.
  • Section Detail Extraction: Extract key scientific details (terminology, numbers, experiments, statistics, conclusions).
  • Information Sufficiency Check: LLM evaluates whether accumulated details are sufficient; terminates or continues reading.
  • Confidence-Based Reading Styles: Conservative, balanced (default), and aggressive modes to control operational overhead.
  • Final Answer Synthesis: Synthesize answer from all accumulated details.

Evaluation

  • IntraBench: 315 test instances across physics, earth science, public health, engineering, material science.
  • LLM-Grounded Multiple-Choice Evaluation: LLM maps generated free-form answers to multiple-choice candidates, addressing synonym/abbreviation challenges.
  • Baselines: RAG systems (vanilla RAG, re-ranking, contextual retrieval) and literature agents (PaperQA2, QASA, SciMaster).
  • Results: 13.2% average cross-domain accuracy improvement over baselines across 7 backbone LLMs.

Key Design Insights

  • Structural knowledge (section hierarchy) is critical for accurate section ranking — semantic similarity alone insufficient.
  • Sufficiency check prevents both hallucination (premature answer with insufficient evidence) and over-reading.
  • The framework can handle queries where the answer is NOT present in the literature (through explicit "None of the above" handling).