SidneyZhang/myWiki

Files

Sidney Zhang 6021dea160

20260625:很多新内容

2026-06-25 14:08:47 +08:00

2.0 KiB

Raw Blame History

title, author, source, date, type, venue, tags, code

title

author

source

date

type

venue

tags

code

Arbor: Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

Jiajie Jin†‡, Yuyang Hu†, Kai Qiu, Qi Dai, Chong Luo, Guanting Dong, Xiaoxi Li, Tong Zhao, Xiaolong Ma, Gongrui Zhang, Zhirong Wu, Bei Liu, Zhengyuan Yang, Linjie Li, Lijuan Wang, Hongjin Qian, Yutao Zhu, Zhicheng Dou*

arXiv 2606.11926v1

2026-06-10

paper

arXiv (cs.CL, cs.AI)

autonomous-research

agent

hypothesis-tree

coordinator-executor

ao

https://github.com/RUC-NLPIR/Arbor

Arbor: Autonomous Research via Hypothesis-Tree Refinement

Jin†‡, Hu†, Qiu, Dai, Luo, Dong, Li, Zhao, Ma, Zhang, Wu, Liu, Yang, Li, Wang, Qian, Zhu, Dou* Renmin University / Microsoft Research | arXiv:2606.11926v1 | Jun 2026

核心问题

如何让 AI Agent 在长程自主科研中运行探索-实验-抽象循环？科学进步依赖反复的方向测试、证据解读和经验传承，但现有 Agent 将这些视为独立的局部尝试而非累积过程。

核心框架：Hypothesis Tree Refinement (HTR)

Arbor 将自主科研建模为 Autonomous Optimization (AO)——Agent 通过迭代实验改进初始研究产物，无需步骤级人工监督。核心状态是一个持久化的假设树：

树的节点 = 研究单元 ⟨h, ι, µ⟩

h (Hypothesis)：可验证/可证伪的改进主张
ι (Insight)：可复用的证据解读——不是执行日志，是紧凑语义记忆
µ (Metadata)：状态、分数、git branch/commit 引用

Coordinator ↔ Executor 双角色

Coordinator（长生命周期）：拥有全局树，管理搜索前沿、选择方向、传播洞察、决定合并/剪枝
Executor（短生命周期，隔离 worktree）：实现并测试单个假设，返回结构化报告

关键结果

6 项真实科研任务（模型训练/Harness 工程/数据合成）：全部最优 held-out 结果
vs Codex/Claude Code：平均 2.5× 相对 held-out 增益
MLE-Bench Lite (GPT-5.5)：86.36% Any Medal