20260625:很多新内容
This commit is contained in:
61
raw/articles/atlas-agent-memory-architecture-2026.md
Normal file
61
raw/articles/atlas-agent-memory-architecture-2026.md
Normal file
@@ -0,0 +1,61 @@
|
||||
---
|
||||
title: "Atlas Agent 记忆架构:三索引 + 混合召回 + 写后提炼"
|
||||
author: "Atlas Memory System (基于 noamschwartz/atlas-memory-demo)"
|
||||
source: "微信公众号"
|
||||
date: "2026"
|
||||
type: article
|
||||
tags: ["agent-memory", "elasticsearch", "hybrid-retrieval", "consolidation", "bias"]
|
||||
---
|
||||
|
||||
# Atlas Agent 记忆系统架构全解析
|
||||
|
||||
> 深度工程实践:Agent 记忆不是 KV 存储问题,是多索引信息检索问题。
|
||||
|
||||
## 核心论点
|
||||
|
||||
`chat_history.append()` 不是记忆系统——那是日志文件。真正的挑战:在三索引(episodic/semantic/procedural)+ catalog 四种不同生命周期的信息中,用对的衰减曲线和互补的检索通道,在查询瞬间找到对的那几条。
|
||||
|
||||
## Atlas 架构
|
||||
|
||||
### 三索引 + 公共
|
||||
| 索引 | 内容 | 衰减源 | 写入频率 |
|
||||
|------|------|--------|---------|
|
||||
| episodic | 原始消息+时间戳 | timestamp | 每回合 |
|
||||
| semantic | 提炼后稳定事实 | last_used_at | consolidation |
|
||||
| procedural | 多步操作流程 | 豁免(1.0) | consolidation |
|
||||
| catalog | 公共共享知识 | timestamp | 手动 |
|
||||
|
||||
### 检索管线
|
||||
1. Verbatim Pre-Recall — 用户原话,不经 LLM 改写
|
||||
2. BM25 + Dense 双通路并行 → RRF 融合 (rank_constant=30)
|
||||
3. Cross-encoder 重排序 (Jina v2, top-80→top-K)
|
||||
4. Reranker 失败降级为 RRF 顺序
|
||||
|
||||
### Ablation 数据
|
||||
- **Full**: R@10=0.89
|
||||
- **Dense-only**: 0.845
|
||||
- **BM25-only**: 0.708
|
||||
- **No-Reranker**: -0.238
|
||||
|
||||
### 五条代码链路
|
||||
- write_memory (refresh=True 保证同轮可见)
|
||||
- recall_memory (混合检索+reranker)
|
||||
- Verbatim Pre-Recall (绕过 LLM 改写层)
|
||||
- Consolidation (episodic→semantic/procedural)
|
||||
- Soft-Supersession (非破坏矛盾处理)
|
||||
|
||||
## 三个通用设计原则
|
||||
|
||||
1. **衰减曲线是领域性决策**—先定义信息有效周期,再定衰减参数
|
||||
2. **BM25+vector 互补**—BM25 抓精确 token,dense 抓语义意图,不可互相替代
|
||||
3. **记忆需要后台提炼+矛盾处理**—consolidation 转化事件为事实,supersession 提供非破坏性更新
|
||||
|
||||
## 与 GBrain 的对比
|
||||
|
||||
| 维度 | Atlas | GBrain |
|
||||
|------|-------|--------|
|
||||
| 存储 | ES 搜索引擎 | Markdown+Git |
|
||||
| 多租户 | ES DLS (集群层) | 应用层 auth |
|
||||
| 矛盾处理 | Soft-Supersession 链 | Git 版本历史 |
|
||||
| 衰减 | Per-index gauss | 无显式衰减 |
|
||||
| 调试透明度 | 仅通过 API | 直接打开文件 |
|
||||
49
raw/articles/financial-llm-practice-2026.md
Normal file
49
raw/articles/financial-llm-practice-2026.md
Normal file
@@ -0,0 +1,49 @@
|
||||
---
|
||||
title: "金融行业大模型落地实践:从长文档检索到 Agent 工程"
|
||||
author: "林金曙(恒生电子研究院 AI 首席技术专家)"
|
||||
source: "DataFun / DAcon 上海站 2026"
|
||||
date: "2026"
|
||||
type: "article"
|
||||
tags: ["financial-llm", "agent", "rag", "pageindex", "mcp", "context-engineering"]
|
||||
---
|
||||
|
||||
# 金融行业大模型落地实践:从长文档检索到 Agent 工程
|
||||
|
||||
> 林金曙,恒生电子研究院 AI 首席技术专家,DAcon 上海站 2026
|
||||
> 编辑整理:韩珊珊 | 出品社区:DataFun
|
||||
|
||||
## 摘要
|
||||
|
||||
系统梳理了金融行业落地大模型的三重挑战(合规刚性、数据安全、业务严谨性),基于恒生电子在券商、基金、银行等机构的实际项目经验,重点分享了 PageIndex 长文档检索方案、Agentic RAG 架构、金融场景"好需求"定义方法、大模型选型教训(Qwen3-32B vs Qwen3-235B)、上下文工程实践,以及 Agent 从工具调用到自主规划的探索。
|
||||
|
||||
## 核心内容
|
||||
|
||||
### 1. 金融行业的三重约束
|
||||
- **合规**:每段生成内容可溯源、结果需人工确认
|
||||
- **安全**:私有化部署、数据不出域
|
||||
- **严谨**:私域数据与业务系统无缝挂接,数据质量优先于模型能力
|
||||
|
||||
### 2. 场景案例
|
||||
- **机构运营**:200+ 件材料办理流程 → 自然语言意图转译为系统操作序列
|
||||
- **投顾理财**:保险条款合规判断(RAG 只解决"看懂",业务闭环需调用系统接口)
|
||||
- **托管运营**:信披报告自动审核(净值、勾稽关系等规则自动化)
|
||||
- **投行**:蜜雪冰城 1300 页招股书 → PageIndex 方案
|
||||
|
||||
### 3. 核心工程实践
|
||||
- **PageIndex**:利用文档目录结构建立"章节名↔页码范围"映射,将检索从 300 页压缩到 3 页
|
||||
- **Agentic RAG**:任务拆解为子问题,动态调用 PageIndex/BM25/向量检索,自我评估信息充分性
|
||||
- **无向量检索**:金融查询大量精确匹配(代码、专有名词、数字),BM25 优于向量检索
|
||||
- **好需求三要素**:在哪里看(限定章节)、看什么(业务语言)、怎么判(SOP 可执行条件)
|
||||
- **选型教训**:Qwen3-32B → 530 条规则/4300 行代码/三人离职;Qwen3-235B → 规则砍半,准确率 +45pp
|
||||
- **上下文工程**:prompt 从 24K token 压缩到 3K,180 个财务指标按需拼入
|
||||
|
||||
### 4. Agent 探索
|
||||
- OpenClaw 在金融场景的四短板:权限模糊、审计不足、插件无管控、幻觉无兜底
|
||||
- Skill 原子化 + MCP 协议接入
|
||||
- 接口大模型友好改造(业务语义、时间标签、功能说明)
|
||||
|
||||
### 5. 核心观点
|
||||
- "不卷织布速,卷机器驾驭力"
|
||||
- "交付乐高式 Skills,交付拼好的乐高小车"
|
||||
- "从代码生产者转身业务审核员"
|
||||
- "弃大脑之争,筑神经之基"
|
||||
41
raw/articles/liyuanyuan-llm-spiral-of-silence-2026.md
Normal file
41
raw/articles/liyuanyuan-llm-spiral-of-silence-2026.md
Normal file
@@ -0,0 +1,41 @@
|
||||
---
|
||||
title: "大模型沉默螺旋:当算法催生数字从众"
|
||||
author: 李媛媛
|
||||
source: 数据派THU (DatapiTHU)
|
||||
date: 2026
|
||||
url: https://mp.weixin.qq.com/s/ZKrx4BzmiOUBsfPVY9YHyw
|
||||
type: article
|
||||
tags:
|
||||
- spiral-of-silence
|
||||
- llm
|
||||
- rag
|
||||
- multi-agent
|
||||
- rlhf
|
||||
- content-ecology
|
||||
---
|
||||
|
||||
## 摘要
|
||||
|
||||
本文系统梳理了大模型沉默螺旋(LLM Spiral of Silence)现象:LLM 无需人类心理动机,仅依靠底层统计生成机制就能自发形成观点从众、小众真相失语、内容高度同质化的"沉默螺旋"效应。文章从经典传播学理论迁移出发,剖析了 RAG 闭环迭代与多智能体交互两大实证场景,拆解了四大技术根源(预训练统计偏好、历史上下文锚定、角色设定固化、RLHF 对齐放大),并提出技术-机制-研究三维治理方案。
|
||||
|
||||
## 核心主张
|
||||
|
||||
- LLM 沉默螺旋是**所有主流大模型的通用系统性问题**(GPT、Llama、通义千问、DeepSeek 等),仅存在效应强弱差异
|
||||
- 无需人类心理动机,纯统计语言生成机制即可自发形成
|
||||
- AI 沉默螺旋比人类社会更隐蔽、迭代更快、压制更强
|
||||
- 小模型、中文模型、RLHF 对齐后模型的沉默螺旋效应更显著
|
||||
|
||||
## 关键实验发现
|
||||
|
||||
1. **RAG 闭环**:5 轮迭代后人类原创内容占比从 50% 暴跌至 15% 以下,搜索引擎算法天然偏好 AI 生成文本
|
||||
2. **多智能体交互**:历史上下文 + 角色设定叠加时,主流观点占比突破 80%,小众观点被完全压制
|
||||
3. **模型差异**:小参数模型 > 大参数模型;中文模型 > 英文模型
|
||||
|
||||
## 参考文献
|
||||
|
||||
[1] ACL 2024. Spiral of Silence: How is Large Language Model Killing Information Retrieval?
|
||||
[2] arXiv 2025. Spiral of Silence in Large Language Model Agents
|
||||
[3] Noelle-Neumann E. The Spiral of Silence: Public Opinion—Our Social Skin, 1984.
|
||||
[4] arXiv 2024. Creativity Has Left the Chat: The Price of Debiasing Language Models
|
||||
[5] Knowledge-Based Systems 2026. Quantifying and mitigating the spiral of silence in recommender systems
|
||||
[6] 周葆华. 网络舆论过程与动态演化:基于计算传播研究的分析[J]. 西北师大学报, 2019.
|
||||
63
raw/articles/memtensor-memos-agent-memory-2026.md
Normal file
63
raw/articles/memtensor-memos-agent-memory-2026.md
Normal file
@@ -0,0 +1,63 @@
|
||||
---
|
||||
title: "MemOS:Agent 记忆系统从效率工具到生存关键"
|
||||
created: 2026-06-19
|
||||
updated: 2026-06-19
|
||||
type: article-raw
|
||||
source: https://mp.weixin.qq.com/s/5Wo91nzstNtCIV9chnuQmw
|
||||
speaker: 熊飞宇
|
||||
company: 记忆张量(MemTensor)
|
||||
publisher: DataFun
|
||||
---
|
||||
|
||||
# MemOS:Agent 记忆系统从效率工具到生存关键
|
||||
|
||||
**分享嘉宾**:熊飞宇,记忆张量(上海)科技有限公司创始人兼CEO,上海算法创新研究院大模型中心负责人
|
||||
**出品社区**:DataFun
|
||||
|
||||
## 核心观点
|
||||
|
||||
记忆(Memory)正在成为 AI Agent 最大的短板。ChatGPT 上线个人记忆功能 + OpenClaw 连续型 Agent 出现后,行业形成共识:记忆不再是锦上添花,而是 Agent 能否持续进化的核心要素。
|
||||
|
||||
## 内容概要
|
||||
|
||||
### 1. 记忆演进:从效率工具到生死关键
|
||||
- ChatGPT 记忆功能:个性化理解是 AGI 时代的关键
|
||||
- OpenClaw 出现:缺乏良好记忆系统,长程 Agent 任务无法顺利执行
|
||||
- 从 single-session → multi-session/multi-user/multi-agent/multi-apps,复杂度指数增长
|
||||
|
||||
### 2. 两条技术路径
|
||||
- **模型驱动**:Memorizing Transformers 等架构创新,成本极高,失败风险大
|
||||
- **应用驱动**:Prompt/Agent 流模拟记忆(Mem0, Zep),轻量但结合不紧密
|
||||
- **MemTensor 做法**:融合两条路径——模型驱动决定上限,应用驱动决定下限
|
||||
|
||||
### 3. MemOS 五层架构
|
||||
- 记忆存储层:MemCube(最小记忆单元)+ MemStore(可交易记忆市场)
|
||||
- 记忆治理层:权限管理、生命周期、水印、隐私
|
||||
- 记忆调度层:核心——明文记忆、激活记忆、参数记忆三层协同
|
||||
- 编解码层 + 应用层
|
||||
|
||||
### 4. 三层记忆协同
|
||||
- **明文记忆**(Explicit):Prompt/Agent 流处理,业界主流
|
||||
- **激活记忆**(Activation):KV Cache 管理,优化缓存命中率和 token 消耗
|
||||
- **参数记忆**(Parameter):行业 know-how 通过后训练注入大模型
|
||||
|
||||
### 5. 平台规模
|
||||
- GitHub 8.5K Star,社区 1.2 万+ 活跃用户
|
||||
- 云服务单月调用量 2500 万+,月涨幅 100-200%
|
||||
- 单次请求节省 45-72% token
|
||||
|
||||
### 6. MemOS 增强 OpenClaw(六大维度)
|
||||
- 存储类型、检索(多路召回/时间衰减/去重)、进化(Mem2Skill)、可视化、协作(Hub)
|
||||
- 三级去重漏斗:SHA-256 → 向量余弦相似度 → LLM Judge
|
||||
- 平均压缩比 75%+,token 消耗降低近 50%
|
||||
- 核心创新 Mem2Skill:记忆不止于被搜到,而是内化为能力
|
||||
|
||||
### 7. ClawForce 企业产品
|
||||
- 解决五痛点:部署难、经验散、响应遗漏、场景受限、数据不可追溯
|
||||
- 五层设计:智能中枢 + 记忆层 + Skill 引擎 + 事件监听 + 工具链接
|
||||
- 三重安全:事前隔离 → 事中脱敏加密 → 事后审计
|
||||
- 场景:研发全链路自动化、电商 7×24 监控、公文写作(-85% 耗时)、销售(客户触达翻倍)
|
||||
|
||||
### 8. 一体机方案
|
||||
- NVIDIA DGX 一体机(128G 显存 + 内存共享)
|
||||
- 中国电信国产算力方案
|
||||
64
raw/articles/michael-jordan-mlst-collectivist-ai-2026.md
Normal file
64
raw/articles/michael-jordan-mlst-collectivist-ai-2026.md
Normal file
@@ -0,0 +1,64 @@
|
||||
---
|
||||
title: "Michael I. Jordan 论 AI 的集体主义经济学(MLST 访谈)"
|
||||
author: Michael I. Jordan (受访), Tim Scarfe (主持)
|
||||
source: 机器之心编译, MLST (Machine Learning Street Talk)
|
||||
date: 2026
|
||||
url: https://mp.weixin.qq.com/s/VEo23R0yst6wjdyzVicYUQ
|
||||
original: https://www.youtube.com/watch?v=AREWYbVtX64
|
||||
paper: https://arxiv.org/pdf/2507.06268
|
||||
type: article
|
||||
tags:
|
||||
- michael-jordan
|
||||
- ai-economics
|
||||
- collectivist-ai
|
||||
- uncertainty
|
||||
- agi-critique
|
||||
---
|
||||
|
||||
## 摘要
|
||||
|
||||
Michael I. Jordan(统计机器学习奠基人,门下走出 Andrew Ng、Yoshua Bengio 等)在 MLST 访谈中围绕论文《AI 的集体主义经济学视角》展开深度对话。核心论点:当前 AI 叙事被个体认知隐喻主导(大脑即计算机),忽略了智能的社会性、经济性和不确定性;需要引入经济学与社会科学构建完整的智能系统框架;AGI 是公关词,超级智能 vs 人类灭绝是虚假二元——两极之间有无数的积极可能性。
|
||||
|
||||
## Michael I. Jordan 背景
|
||||
|
||||
- 加州大学伯克利分校 EECS + 统计系杰出教授,Inria 巴黎研究员
|
||||
- 2016 年《科学》杂志「全球最具影响力计算机科学家」
|
||||
- 学生:Andrew Ng、Yoshua Bengio、Zoubin Ghahramani、Eric Xing、David Blei 等
|
||||
- 领域:图模型、变分推断、贝叶斯非参数方法
|
||||
|
||||
## 核心观点
|
||||
|
||||
### 1. AGI 是公关词
|
||||
|
||||
「AGI 只是个公关词。它是一种扭曲。」AI 术语回归(伴随 LLM 兴起)对研究路径和商业模式产生扭曲效应。真正的机器学习传统(决策树、逻辑回归、供应链预测)一直存在且影响更大,但因为没有"人类可读输出"而被忽视。
|
||||
|
||||
### 2. AI 需要经济学——集体主义框架
|
||||
|
||||
主流 AI 思维的根本缺陷:**将智能窄化为个体认知**(大脑隐喻 → 神经元 → 梯度下降),忽略了人是社会动物。框架落在一个三角形上:
|
||||
- [[collectivist-ai|CS + 统计学 + 经济学]]
|
||||
|
||||
「只有计算加优化,你就只能得到语言模型。把统计和经济思维加进来,才开始有完整的系统性思考。」
|
||||
|
||||
### 3. 停止人类化机器
|
||||
|
||||
不要问"它是否理解"——要问:能不能降低不确定性、能不能让工程系统建立在它之上、能不能让计划成为可能。[[anthropomorphization-critique|人类化机器]]系统性地转移了注意力,让人忘记真正重要的工程问题:失效条件、误差范围、与真实数据的结合、谁来承担出错的后果。
|
||||
|
||||
### 4. 基础模型在知识边界最危险
|
||||
|
||||
[[foundation-model-frontier-bias|基础模型前沿偏倚]]:科学家感兴趣的是知识边界上的新问题,而基础模型恰恰在那里训练数据最稀少、偏倚最大。AlphaFold 案例——量子涨落预测的置信区间极窄但完全偏离真实值。解决方案:[[prediction-driven-inference|预测驱动推断]]——混合少量真实标注数据与大量模型预测。
|
||||
|
||||
### 5. 超级智能 vs 人类灭绝是虚假二元
|
||||
|
||||
「那种思想领袖分成两队,一队冲向乌托邦,一队冲向末日——在人类历史上这种程度的现实脱节是非常罕见的。」年轻人缺少"靠做出真正有用的东西让世界变好一点点"的榜样。两极之间有无数积极的事情可以做。
|
||||
|
||||
## Jordan 的不确定性三分法
|
||||
|
||||
[[uncertainty-taxonomy|不确定性分类法]](超越经典 epistemic/aleatoric 二分):
|
||||
1. **采样不确定性**——数据是否足够?但在社会语境中需按纳什均衡处理(鸭子比喻)
|
||||
2. **信息不对称**——结构性不透明,不会消失(经济学范畴)
|
||||
3. **数据时效性(providence)**——数据的时间元数据应定量纳入不确定性计算
|
||||
|
||||
## 参考文献
|
||||
|
||||
- Jordan, M.I. *A Collectivist, Economic Perspective on AI*. arXiv:2507.06268.
|
||||
- MLST 访谈: https://www.youtube.com/watch?v=AREWYbVtX64
|
||||
62
raw/articles/nobrega-ai-production-tradeoffs-2026.md
Normal file
62
raw/articles/nobrega-ai-production-tradeoffs-2026.md
Normal file
@@ -0,0 +1,62 @@
|
||||
---
|
||||
title: "Six Choices Every AI Engineer Has to Make (and Nobody Teaches)"
|
||||
created: 2026-06-19
|
||||
updated: 2026-06-19
|
||||
type: article-raw
|
||||
source: https://towardsdatascience.com/six-choices-every-ai-engineer-has-to-make-and-nobody-teaches/
|
||||
wechat: https://mp.weixin.qq.com/s/GESoyR0qpxP4fPtHZjonKA
|
||||
translator: 陈超
|
||||
publisher: 数据派THU
|
||||
---
|
||||
|
||||
# Six Choices Every AI Engineer Has to Make (and Nobody Teaches)
|
||||
|
||||
**作者**:Sara Nobrega
|
||||
**翻译**:陈超(北京大学应用心理硕士)
|
||||
**发布**:数据派THU(DatapiTHU)
|
||||
**原文**:Towards Data Science
|
||||
|
||||
## 核心主题
|
||||
|
||||
AI 生产中 6 种关键权衡,都有最新研究支持。
|
||||
|
||||
## 6 种权衡
|
||||
|
||||
### 1. 构建 vs 购买(Build vs Buy)
|
||||
- 三个选择:调用 API、微调开源模型、自建托管
|
||||
- 日请求 < 10 万 → API(GPT-4o Mini)
|
||||
- 日请求 > 100 万 → 自建(但注意:人力占成本的 70-80%,GPU 只占 20-30%)
|
||||
- 团队平均超出 LLM 预算 340%,主因是缺少使用跟踪和成本归属
|
||||
|
||||
### 2. 模型复杂度 vs 可维护性
|
||||
- CACE 原理:Change Anything Changes Everything (Sculley et al., 2015)
|
||||
- 数据依赖比代码依赖更昂贵
|
||||
- 为 2% 精度提升选择更复杂模型 → 支付 18 个月调试税
|
||||
|
||||
### 3. 数据数量 vs 数据质量
|
||||
- 超过噪声阈值,更多低质量数据会降低性能
|
||||
- "数据沼泽"问题:存储便宜 → 什么都存 → 清理成本爆炸
|
||||
- 医疗 AI:专家标注小数据集 > 不可靠标注大数据集
|
||||
|
||||
### 4. 吞吐量 vs 延迟(批处理 vs 实时)
|
||||
- 批处理:按时生成预测,低成本,简单,预测可能过时
|
||||
- 实时:按需,毫秒级,昂贵,24/7 运维
|
||||
- 大多数业务问题不需要亚秒级预测
|
||||
|
||||
### 5. 提示词工程 vs 微调
|
||||
- 提示词工程:快、便宜、灵活,但脆弱
|
||||
- 微调:昂贵(GPT-4o 客户支持约 $1万 + 6 周),但规模化可靠
|
||||
- DSPy 提示优化在部分基准上超微调 6-19 个百分点
|
||||
- 混合模式兴起:微调解决风格/基调 + RAG 作事实基础
|
||||
|
||||
### 6. 自动化 vs 人类监督(HITL)
|
||||
- 完全人工审查无法规模化
|
||||
- 选择性 HITL:只在边缘案例、低置信度、高风险决策时触发人工
|
||||
- AI 处理规模/速度/模式识别,人类处理不可逆性
|
||||
- 医疗/金融/法律领域,HITL 通常是合规要求
|
||||
|
||||
## 核心原则
|
||||
|
||||
> 在生产中,决策的成本很少在决策做出的地方产生回报。
|
||||
|
||||
复杂度的代价延迟偿付——更复杂的模型在 6 个月后增加维护成本,实时系统的 24/7 基础设施支撑长期代价更高,大规模脏数据在重训练周期上付出代价。
|
||||
38
raw/papers/arbor-htr-2026.md
Normal file
38
raw/papers/arbor-htr-2026.md
Normal file
@@ -0,0 +1,38 @@
|
||||
---
|
||||
title: "Arbor: Toward Generalist Autonomous Research via Hypothesis-Tree Refinement"
|
||||
author: "Jiajie Jin†‡, Yuyang Hu†, Kai Qiu, Qi Dai, Chong Luo, Guanting Dong, Xiaoxi Li, Tong Zhao, Xiaolong Ma, Gongrui Zhang, Zhirong Wu, Bei Liu, Zhengyuan Yang, Linjie Li, Lijuan Wang, Hongjin Qian, Yutao Zhu, Zhicheng Dou*"
|
||||
source: "arXiv 2606.11926v1"
|
||||
date: "2026-06-10"
|
||||
type: paper
|
||||
venue: "arXiv (cs.CL, cs.AI)"
|
||||
tags: ["autonomous-research", "agent", "hypothesis-tree", "coordinator-executor", "ao"]
|
||||
code: "https://github.com/RUC-NLPIR/Arbor"
|
||||
---
|
||||
|
||||
# Arbor: Autonomous Research via Hypothesis-Tree Refinement
|
||||
|
||||
> Jin†‡, Hu†, Qiu, Dai, Luo, Dong, Li, Zhao, Ma, Zhang, Wu, Liu, Yang, Li, Wang, Qian, Zhu, Dou*
|
||||
> Renmin University / Microsoft Research | arXiv:2606.11926v1 | Jun 2026
|
||||
|
||||
## 核心问题
|
||||
|
||||
如何让 AI Agent 在长程自主科研中运行探索-实验-抽象循环?科学进步依赖反复的方向测试、证据解读和经验传承,但现有 Agent 将这些视为独立的局部尝试而非累积过程。
|
||||
|
||||
## 核心框架:Hypothesis Tree Refinement (HTR)
|
||||
|
||||
Arbor 将自主科研建模为 **Autonomous Optimization (AO)**——Agent 通过迭代实验改进初始研究产物,无需步骤级人工监督。核心状态是一个持久化的假设树:
|
||||
|
||||
### 树的节点 = 研究单元 ⟨h, ι, µ⟩
|
||||
- **h (Hypothesis)**:可验证/可证伪的改进主张
|
||||
- **ι (Insight)**:可复用的证据解读——不是执行日志,是紧凑语义记忆
|
||||
- **µ (Metadata)**:状态、分数、git branch/commit 引用
|
||||
|
||||
### Coordinator ↔ Executor 双角色
|
||||
- **Coordinator**(长生命周期):拥有全局树,管理搜索前沿、选择方向、传播洞察、决定合并/剪枝
|
||||
- **Executor**(短生命周期,隔离 worktree):实现并测试单个假设,返回结构化报告
|
||||
|
||||
## 关键结果
|
||||
|
||||
- 6 项真实科研任务(模型训练/Harness 工程/数据合成):全部最优 held-out 结果
|
||||
- vs Codex/Claude Code:**平均 2.5×** 相对 held-out 增益
|
||||
- MLE-Bench Lite (GPT-5.5):**86.36%** Any Medal
|
||||
29
raw/papers/cao-nano-filter-2024.md
Normal file
29
raw/papers/cao-nano-filter-2024.md
Normal file
@@ -0,0 +1,29 @@
|
||||
---
|
||||
title: "NANO Filter 原始存档"
|
||||
created: 2026-06-22
|
||||
type: raw
|
||||
arxiv: "2410.15832"
|
||||
source: "https://arxiv.org/abs/2410.15832"
|
||||
---
|
||||
|
||||
# Nonlinear Bayesian Filtering with Natural Gradient Gaussian Approximation
|
||||
|
||||
- **作者**: Wenhan Cao, Tianyi Zhang, Zeju Sun, Chang Liu, Stephen S.-T. Yau, Shengbo Eben Li
|
||||
- **机构**: 清华大学(车辆与运载学院、数学科学系)、北京大学(工学院)、BIMSA
|
||||
- **arXiv**: 2410.15832 [eess.SY]
|
||||
- **提交**: 2024-10-21 | 最新版本 v4: 2026-03-15
|
||||
- **DOI**: https://doi.org/10.48550/arXiv.2410.15832
|
||||
|
||||
## 摘要
|
||||
|
||||
Practical Bayes filters often assume the state distribution of each time step to be Gaussian for computational tractability, resulting in the so-called Gaussian filters. When facing nonlinear systems, Gaussian filters such as extended Kalman filter (EKF) or unscented Kalman filter (UKF) typically rely on certain linearization techniques, which can introduce large estimation errors. To address this issue, this paper reconstructs the prediction and update steps of Gaussian filtering as solutions to two distinct optimization problems, whose optimal conditions are found to have analytical forms from Stein's lemma. It is observed that the stationary point for the prediction step requires calculating the first two moments of the prior distribution, which is equivalent to that step in existing moment-matching filters. In the update step, instead of linearizing the model to approximate the stationary points, we propose an iterative approach to directly minimize the update step's objective to avoid linearization errors. For the purpose of performing the steepest descent on the Gaussian manifold, we derive its natural gradient that leverages Fisher information matrix to adjust the gradient direction, accounting for the curvature of the parameter space. Combining this update step with moment matching in the prediction step, we introduce a new iterative filter for nonlinear systems called **N**atural Gr**a**dient Gaussia**n** Appr**o**ximation filter, or NANO filter for short. We prove that NANO filter locally converges to the optimal Gaussian approximation at each time step. Furthermore, the estimation error is proven exponentially bounded for nearly linear measurement equation and low noise levels through constructing a supermartingale-like property across consecutive time steps.
|
||||
|
||||
## 关键概念
|
||||
|
||||
- Natural gradient descent on Gaussian manifold
|
||||
- Fisher information matrix
|
||||
- Moment matching (prediction step)
|
||||
- Stein's lemma for optimality conditions
|
||||
- Gibbs posterior for robustness
|
||||
- Pseudo-Huber loss for outlier handling
|
||||
- Convergence proof & exponential error bound
|
||||
33
raw/papers/dao-transformers-are-ssms-2024.md
Normal file
33
raw/papers/dao-transformers-are-ssms-2024.md
Normal file
@@ -0,0 +1,33 @@
|
||||
---
|
||||
title: "Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality"
|
||||
source: arXiv
|
||||
source_id: 2405.21060
|
||||
authors:
|
||||
- Tri Dao (Princeton University)
|
||||
- Albert Gu (Carnegie Mellon University)
|
||||
published: 2024-05-31
|
||||
venue: ICML 2024
|
||||
categories:
|
||||
- cs.LG
|
||||
---
|
||||
|
||||
# Transformers are SSMs
|
||||
|
||||
## Abstract
|
||||
While Transformers dominate language modeling, state-space models (SSMs) such as Mamba have matched or outperformed them at small-to-medium scale. This paper shows these model families are closely related through **structured state space duality (SSD)**, connected via **semiseparable matrices**. The SSD framework enables Mamba-2, a refined selective SSM that is 2-8x faster than Mamba while competitive with Transformers.
|
||||
|
||||
## Core Contributions
|
||||
1. **SSD Framework**: Equivalence between SSMs and semiseparable matrices → connects SSM recurrence with attention-like quadratic forms
|
||||
2. **Structured Masked Attention (SMA)**: Generalizes linear attention with data-dependent position masks
|
||||
3. **SSD Algorithm**: Block decomposition of semiseparable matrices, leveraging both linear (recurrent) and quadratic (attention-like) forms
|
||||
4. **Mamba-2 Architecture**: Multi-head SSM design with tensor parallelism support
|
||||
5. **Systems Optimizations**: TP, sequence parallelism, variable-length training
|
||||
|
||||
## Key Concepts
|
||||
- Structured State Space Duality (SSD), Semiseparable Matrices
|
||||
- Structured Masked Attention (SMA), Linear Attention
|
||||
- Selective SSMs, Scalar SSM, Head Structure for SSMs (MIS/MVA/GVA)
|
||||
- SSD Algorithm, Block Decomposition, Tensor Contraction Duality
|
||||
|
||||
## URL
|
||||
https://arxiv.org/abs/2405.21060
|
||||
32
raw/papers/engram-conditional-memory-2026.md
Normal file
32
raw/papers/engram-conditional-memory-2026.md
Normal file
@@ -0,0 +1,32 @@
|
||||
---
|
||||
title: "Engram: Conditional Memory via Scalable Lookup (Raw Archive)"
|
||||
created: 2026-06-25
|
||||
updated: 2026-06-25
|
||||
type: raw
|
||||
tags: ["conditional-memory", "sparsity", "ngram", "mixture-of-experts"]
|
||||
source: "https://arxiv.org/abs/2601.07372"
|
||||
---
|
||||
|
||||
# Engram: Conditional Memory via Scalable Lookup — Raw Archive
|
||||
|
||||
## Metadata
|
||||
|
||||
- **Title**: Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
|
||||
- **Authors**: Xin Cheng, Wangding Zeng, Damai Dai, Qinyu Chen, Bingxuan Wang, Zhenda Xie, Kezhao Huang, Xingkai Yu, Zhewen Hao, Yukun Li, Han Zhang, Huishuai Zhang, Dongyan Zhao, Wenfeng Liang
|
||||
- **Affiliations**: Peking University, DeepSeek-AI
|
||||
- **arXiv**: 2601.07372
|
||||
- **Date**: 2026-01-12
|
||||
- **Categories**: cs.CL, cs.AI
|
||||
- **Code**: https://github.com/deepseek-ai/Engram
|
||||
|
||||
## Abstract
|
||||
|
||||
While Mixture-of-Experts (MoE) scales capacity via conditional computation, Transformers lack a native primitive for knowledge lookup, forcing them to inefficiently simulate retrieval through computation. To address this, we introduce conditional memory as a complementary sparsity axis, instantiated via Engram, a module that modernizes classic N-gram embedding for O(1) lookup. By formulating the Sparsity Allocation problem, we uncover a U-shaped scaling law that optimizes the trade-off between neural computation (MoE) and static memory (Engram). Guided by this law, we scale Engram to 27B parameters, achieving superior performance over a strictly iso-parameter and iso-FLOPs MoE baseline. Most notably, while the memory module is expected to aid knowledge retrieval (e.g., MMLU +3.4; CMMLU +4.0), we observe even larger gains in general reasoning (e.g., BBH +5.0; ARC-Challenge +3.7) and code/math domains (HumanEval +3.0; MATH +2.4). Mechanistic analyses reveal that Engram relieves the backbone's early layers from static reconstruction, effectively deepening the network for complex reasoning. Furthermore, by delegating local dependencies to lookups, it frees up attention capacity for global context, substantially boosting long-context retrieval (e.g., Multi-Query NIAH: 84.2 to 97.0).
|
||||
|
||||
## Key Contributions
|
||||
|
||||
1. Conditional memory as a new sparsity axis complementary to MoE
|
||||
2. Engram module: modernized N-gram embedding with multi-head hashing, context-aware gating, depthwise convolution
|
||||
3. Sparsity Allocation problem and U-shaped scaling law
|
||||
4. Infrastructure-aware design: deterministic addressing enables host memory prefetching
|
||||
5. Empirical validation at 27B-40B scale with comprehensive ablation
|
||||
56
raw/papers/fei-mcp-zero-2025.md
Normal file
56
raw/papers/fei-mcp-zero-2025.md
Normal file
@@ -0,0 +1,56 @@
|
||||
---
|
||||
title: "MCP-Zero: Active Tool Discovery for Autonomous LLM Agents"
|
||||
created: 2026-06-19
|
||||
updated: 2026-06-19
|
||||
type: paper-raw
|
||||
source: https://arxiv.org/abs/2506.01056
|
||||
arxiv_id: 2506.01056
|
||||
version: v4
|
||||
---
|
||||
|
||||
# MCP-Zero: Active Tool Discovery for Autonomous LLM Agents
|
||||
|
||||
**Authors**: Xiang Fei, Xiawu Zheng*, Hao Feng (Xiamen University, USTC)
|
||||
**Published**: 2025-06-01 (v4: 2025-06-24)
|
||||
**Venue**: arXiv:2506.01056 (cs.AI, cs.SE)
|
||||
**Code**: https://github.com/xfey/MCP-Zero
|
||||
|
||||
## 核心洞察
|
||||
|
||||
当前 LLM Agent 的工具使用是**被动的**——将所有 tool schema 注入 system prompt 让模型从中选择。这有两个致命问题:(1) 上下文开销爆炸(GitHub MCP server 一个就需要 4600+ tokens,全生态 248K tokens);(2) 决策自主权被剥夺——模型从"自主能力构建者"退化为"被动选择器"。
|
||||
|
||||
MCP-Zero 将范式翻转为**主动工具发现(Active Tool Discovery)**:Agent 自主识别能力缺口,按需生成结构化工具请求,系统匹配并返回。
|
||||
|
||||
## 三大机制
|
||||
|
||||
### 1. Active Tool Request
|
||||
模型自主生成结构化请求:
|
||||
```
|
||||
<tool_assistant>
|
||||
server: File system allowing file operations
|
||||
tool: Read file by filename
|
||||
</tool_assistant>
|
||||
```
|
||||
关键:请求在**工具文档的语义空间**中,语义对齐度高于原始用户查询。
|
||||
|
||||
### 2. Hierarchical Semantic Routing
|
||||
两级粗到细检索:
|
||||
- 第一级:server 字段 → 匹配 server 描述(含增强摘要)
|
||||
- 第二级:tool 字段 → 在选中的 server 内排序
|
||||
- 评分:score = (s_server × s_tool) × max(s_server, s_tool)
|
||||
- 复杂度从 O(n) 降至 O(m+k),m+k ≪ n
|
||||
|
||||
### 3. Iterative Capability Extension
|
||||
支持多轮迭代发现:模型可逐步构建跨域 toolchain(文件→编辑→执行),当前工具不足时可优化请求重新检索。
|
||||
|
||||
## 关键数据
|
||||
|
||||
- 数据集 MCP-tools:308 servers, 2,797 tools
|
||||
- APIBank 上 token 消耗降低 **98%** 且保持高准确率
|
||||
- 在 248.1K tokens 的工具描述空间中精准选择
|
||||
|
||||
## 理论分析
|
||||
|
||||
- 主动发现建模为 active learning:r* = arg max I(T*; r|s_t)
|
||||
- 注意力分布:被动 O(1/n) ↘ 主动 O(1/k),k ≪ n
|
||||
- 语义对齐优势:cos(e_r, e_t) > cos(e_q, e_t)
|
||||
36
raw/papers/gan-bifurcation-eos-2026.md
Normal file
36
raw/papers/gan-bifurcation-eos-2026.md
Normal file
@@ -0,0 +1,36 @@
|
||||
---
|
||||
title: "A Bifurcation Theory Framework for Gradient Descent on the Edge of Stability"
|
||||
created: 2026-06-23
|
||||
type: paper-raw
|
||||
arxiv: "2606.15551v1"
|
||||
category: cs.LG
|
||||
author: "Eric Gan"
|
||||
date: 2026-06-14
|
||||
venue: Preprint
|
||||
---
|
||||
|
||||
# A Bifurcation Theory Framework for Gradient Descent on the Edge of Stability
|
||||
|
||||
- **作者**: Eric Gan (Independent Researcher, egan8@ucla.edu)
|
||||
- **arXiv**: 2606.15551v1
|
||||
- **领域**: cs.LG (Machine Learning)
|
||||
- **日期**: 2026-06-14
|
||||
- **来源**: https://arxiv.org/abs/2606.15551
|
||||
|
||||
## 摘要
|
||||
|
||||
The Edge of Stability (EoS) phenomenon, where gradient descent operates with sharpness exceeding the classical convergence threshold yet the loss decreases over long timescales, is ubiquitous in modern deep learning but remains poorly understood in realistic settings. Prior rigorous analyses have been largely confined to scalar or low-dimensional losses with specific structural forms. In this work, we develop a bifurcation theory framework for gradient descent on the edge of stability that applies directly to overparameterized neural networks. By decomposing the training dynamics into components normal and tangent to the manifold of minimizers, we show that stable EoS training arises from a flip bifurcation in the normal direction, governed by the sign of the first Lyapunov coefficient, while the tangent dynamics drift toward regions of decreasing sharpness. Under mild spectral and geometric assumptions on the loss landscape, we prove convergence to the minimizing manifold when training at the EoS threshold. As a corollary, we recover and unify prior results: we show that the product-stability condition of Gan (2026) is an instance of our framework.
|
||||
|
||||
## 核心贡献
|
||||
|
||||
1. 发展了一个适用于过参数化网络的分岔理论 EoS 框架
|
||||
2. 将 EoS 动力学分解为法向 flip 分岔 + 切向 sharpness 递减漂移
|
||||
3. 证明了在 EoS 阈值处(η = 2/λ_max)收敛到极小值流形 (Theorem 4.4)
|
||||
4. 统一了乘积稳定性 (Gan 2026) 为框架特例
|
||||
|
||||
## 关键技术工具
|
||||
|
||||
- 中心流形定理 (Center Manifold Theorem)
|
||||
- 投影法 (Projection Method)
|
||||
- 第一 Lyapunov 系数 (c₁)
|
||||
- Morse-Bott 条件 + 谱间隙假设
|
||||
39
raw/papers/gan-thinking-based-non-thinking-2026.md
Normal file
39
raw/papers/gan-thinking-based-non-thinking-2026.md
Normal file
@@ -0,0 +1,39 @@
|
||||
---
|
||||
title: "Thinking-Based Non-Thinking: Solving the Reward Hacking Problem in Training Hybrid Reasoning Models via Reinforcement Learning"
|
||||
source: arXiv
|
||||
source_id: 2601.04805
|
||||
authors:
|
||||
- Siyuan Gan (Nanjing University)
|
||||
- Jiaheng Liu (Nanjing University)
|
||||
- Boyan Wang (Nanjing University)
|
||||
- Tianpei Yang (Nanjing University)
|
||||
- Runqing Miao (Jiutian Research)
|
||||
- Yuyao Zhang (Jiutian Research)
|
||||
- Fanyu Meng (Jiutian Research)
|
||||
- Junlan Feng (Jiutian Research)
|
||||
- Linjian Meng (Shanghai AI Laboratory)
|
||||
- Jing Huo (Nanjing University)
|
||||
- Yang Gao (Nanjing University)
|
||||
published: 2026-01-08
|
||||
updated: 2026-06-07
|
||||
categories:
|
||||
- cs.AI
|
||||
venue: Preprint
|
||||
---
|
||||
|
||||
# Thinking-Based Non-Thinking (TNT)
|
||||
|
||||
## Abstract
|
||||
Large reasoning models (LRMs) achieve exceptional performance via long Chain-of-Thought (thinking), causing substantial computational overhead — the overthinking problem. RL-trained hybrid reasoning models that dynamically choose thinking/non-thinking modes suffer from **reward hacking**: the model generates thinking-like responses while being classified as non-thinking, receiving undeserved rewards.
|
||||
|
||||
Existing mitigations: (1) SFT with large datasets (high cost), or (2) uniform token limits on non-thinking (ineffective for varied query difficulties). TNT proposes **per-query dynamic token limits** derived from the thinking mode's solution length — leveraging the fact that LRMs' thinking mode ensures its solution component contains no additional thinking.
|
||||
|
||||
## Core Contributions
|
||||
1. **TNT (Thinking-Based Non-Thinking)**: Dynamic per-query maximum token usage for non-thinking mode, derived from the solution component of thinking mode responses
|
||||
2. **50% token reduction** vs DeepSeek-R1-Distill-Qwen while **improving accuracy** across 5 math benchmarks
|
||||
3. **Optimal accuracy-efficiency trade-off** among all tested hybrid reasoning methods
|
||||
4. **<10% reward hacking rate** across all datasets
|
||||
5. Compatible with any RL algorithm (GRPO, PPO, DAPO, Dr.GRPO, GSPO)
|
||||
|
||||
## URL
|
||||
https://arxiv.org/abs/2601.04805
|
||||
53
raw/papers/gaurav-dynamic-react-2025.md
Normal file
53
raw/papers/gaurav-dynamic-react-2025.md
Normal file
@@ -0,0 +1,53 @@
|
||||
---
|
||||
title: "Dynamic ReAct: Scalable Tool Selection for Large-Scale MCP Environments"
|
||||
created: 2026-06-19
|
||||
updated: 2026-06-19
|
||||
type: paper-raw
|
||||
source: https://arxiv.org/abs/2509.20386
|
||||
arxiv_id: 2509.20386
|
||||
version: v1
|
||||
---
|
||||
|
||||
# Dynamic ReAct: Scalable Tool Selection for Large-Scale MCP Environments
|
||||
|
||||
**Authors**: Nishant Gaurav, Adit Akarsh, Ankit Ranjan, Manoj Bajaj (agentr.dev)
|
||||
**Published**: 2025-09-22
|
||||
**Venue**: arXiv:2509.20386 (cs.SE, cs.AI, cs.IR)
|
||||
|
||||
## 核心问题
|
||||
|
||||
当 MCP 工具生态扩展到数百到数千个工具时,传统 ReAct Agent 的全量加载方式不可行——LLM 上下文有硬限制。
|
||||
|
||||
## 五架构演进
|
||||
|
||||
### 1. Baseline: Direct Semantic Search
|
||||
用户查询直接入向量库 → 取 top-k → 绑定 LLM。简单但噪声严重("退订链接"查询返回 Mailchimp 的 unsubscribe 报告而非 Gmail 工具)。
|
||||
|
||||
### 2. Meta-Tool Query Construction
|
||||
暴露向量搜索为 meta-tool,LLM 先构造原子化搜索查询再检索。更精确,但仍需大 k 值。
|
||||
|
||||
### 3. Search and Load(★ 最优)
|
||||
两个 meta-tool:`search_tools`(两级搜索,k1=20→去重→每应用上限 k2=5)+ `load_tools`(LLM 精选后显式加载)。多查询合并、精确加载 < 5 个工具。
|
||||
|
||||
### 4. Application-Aware (Hierarchical Search)
|
||||
增加 `search_apps` 先定位应用再搜工具。application filtering 在语义搜索中效果有限——LLM 倾向直接用 query 包含 app 名。
|
||||
|
||||
### 5. Fixed Tool Set
|
||||
四个固定 meta-tool 动态获取工具信息并调用。缓存效率好,但长对话中性能退化。
|
||||
|
||||
## 向量检索优化
|
||||
|
||||
| 策略 | Top-5 | Top-10 |
|
||||
|------|-------|--------|
|
||||
| OpenAI text-embedding-3-large (baseline) | 40% | 64% |
|
||||
| voyage-context-3 | 48% | 68% |
|
||||
| **voyage-context-3 + Sonnet context enrichment** | **60%** | 68% |
|
||||
| + BM25 hybrid | 56% | 72% |
|
||||
|
||||
Context enrichment 带来 50% 相对提升(Top-5: 40→60%)。
|
||||
|
||||
## 关键创新
|
||||
|
||||
- **default tools**:create_table + web_search 始终可用,避免为通用任务浪费搜索
|
||||
- **Meta-tool 作为"七杠杆"**:LLM Client (1) + Meta Tools (4) + Tool Registry (1) + Vector DB (1)
|
||||
- 工具加载减少 **50%**,准确率不降
|
||||
94
raw/papers/gu-mamba-2024.md
Normal file
94
raw/papers/gu-mamba-2024.md
Normal file
@@ -0,0 +1,94 @@
|
||||
---
|
||||
title: "Mamba: Linear-Time Sequence Modeling with Selective State Spaces"
|
||||
authors: ["Albert Gu", "Tri Dao"]
|
||||
date: 2023-12-01
|
||||
arxiv_id: "2312.00752v2"
|
||||
categories: ["cs.LG", "cs.AI"]
|
||||
affiliations: ["Carnegie Mellon University", "Princeton University"]
|
||||
paper_type: "conference"
|
||||
code: "https://github.com/state-spaces/mamba"
|
||||
---
|
||||
|
||||
# Mamba: Linear-Time Sequence Modeling with Selective State Spaces
|
||||
|
||||
## 摘要
|
||||
|
||||
Foundation model 几乎全部基于 Transformer 架构,但其注意力机制的二次复杂度在处理长序列时效率极低。各种次二次复杂度架构(线性注意力、门控卷积、结构状态空间模型)试图取代注意力,但在语言等核心模态上始终达不到 Transformer 质量。本文识别出这些模型的根本弱点——**缺乏内容感知推理能力(content-based reasoning)**——并通过两个关键创新解决:(1) 让 SSM 参数成为输入的函数(选择机制,S6),使模型能根据当前 token 选择性传播或遗忘信息;(2) 设计硬件感知的并行算法,在循环模式下高效计算。最终形成极简架构 Mamba——无注意力甚至无 MLP 块。Mamba 推理吞吐量是 Transformer 的 5 倍,序列长度线性扩展,在语言、音频、基因组学等多个模态达到 SOTA。Mamba-3B 性能超过同规模 Transformer 并匹敌两倍规模的 Transformer。
|
||||
|
||||
## 核心贡献
|
||||
|
||||
1. **选择机制(Selection Mechanism / S6)**:将 SSM 参数(Δ, B, C)变为输入依赖,从时间不变(LTI)升级为时间变化
|
||||
2. **硬件感知算法**:通过并行关联扫描(parallel associative scan)在 SRAM 中计算,避免 GPU HBM 之间的 IO 瓶颈
|
||||
3. **极简架构 Mamba**:将 H3 架构中的 SSM 层与 MLP 门控融合为单一同质块
|
||||
4. **选择复制(Selective Copying)和归纳头(Induction Heads)合成任务**:Mamba 不仅轻松解决,且能无限外推(>1M tokens)
|
||||
|
||||
## 方法框架
|
||||
|
||||
### 从 S4 到 S6
|
||||
|
||||
传统 S4 的关键局限是 **线性时间不变性(LTI)**:参数 (Δ, A, B, C) 对所有时间步固定。这意味着状态更新规则不随输入内容改变——模型无法"选择性"关注或忽略特定 token。
|
||||
|
||||
Mamba 的选择机制(S6)将 B, C, Δ 变为输入 x 的函数:
|
||||
```
|
||||
B_t = s_B(x_t) # 输入 → 输入投影
|
||||
C_t = s_C(x_t) # 输入 → 输出投影
|
||||
Δ_t = τ_Δ(Δ + s_Δ(x_t)) # 输入依赖的步长
|
||||
```
|
||||
|
||||
核心差异:
|
||||
| 特性 | S4 (LTI) | S6 (Selective) |
|
||||
|------|---------|---------------|
|
||||
| 参数 | 时间不变 | 时间变化(输入依赖) |
|
||||
| 计算模式 | 卷积 OR 循环 | 仅循环(需 scan) |
|
||||
| 选择性 | 无 | 有(过滤/保留) |
|
||||
| 内容感知 | 否 | 是 |
|
||||
|
||||
### 硬件感知并行 Scan
|
||||
|
||||
选择机制消除了卷积等价性——模型必须是时间变化的,无法用卷积并行计算。Mamba 通过**并行关联扫描(parallel associative scan / Blelloch scan)**解决:
|
||||
|
||||
1. 将状态更新展开为前缀和操作
|
||||
2. 在 GPU SRAM 中做 kernel fusion,避免将扩展状态写入 HBM
|
||||
3. 输入在 HBM → 加载到 SRAM → scan + 离散化 → 写回 HBM
|
||||
|
||||
结果:比所有基于卷积的 SSM 快 3×(A100 GPU)。
|
||||
|
||||
### Mamba 架构
|
||||
|
||||
```
|
||||
Input → Mamba Block → ... (×L) → Output
|
||||
|
||||
Mamba Block:
|
||||
x → LayerNorm → [Linear(expand) → Conv1d → SiLU → SSM(S6)] → LayerNorm → Linear → + (residual)
|
||||
```
|
||||
|
||||
关键设计:
|
||||
- **无注意力、无 MLP**:用选择性 SSM 取代二者
|
||||
- **扩展因子 E=2**:Linear 将 d_model 扩展到 2× 再投影回
|
||||
- **残差连接 + SiLU 激活**
|
||||
- **H3 简化**:将 H3 的两个门控 SSM 融合为单一选择性 SSM
|
||||
|
||||
## 实验结果
|
||||
|
||||
- **合成任务**:Selective Copying 和 Induction Heads → Mamba 可以泛化到 >1M token 序列
|
||||
- **语言建模**:Mamba-3B 在 pretraining perplexity 和 0-shot 评估上超过 Pythia-3B,匹敌 Pythia-7B;5× 推理吞吐
|
||||
- **音频**:在 SC09 语音生成上将 FID 降低一半以上
|
||||
- **基因组学**:在 DNA 建模上超过 HyenaDNA 和 Transformer
|
||||
|
||||
## 关键概念
|
||||
|
||||
- [[selective-state-space]] — S6 选择机制,输入依赖的 SSM 参数化
|
||||
- [[hardware-aware-algorithm]] — GPU 层次优化的并行 scan
|
||||
- [[structured-state-space-models]] — S4 前身,HiPPO 矩阵 + 对角结构
|
||||
- [[selective-copy]] — 需要内容感知的选择性复制任务
|
||||
- [[induction-heads]] — 解释 LLM in-context learning 能力的机制
|
||||
- [[hippo]] — SSM 的数学基础(High-order Polynomial Projection Operators)
|
||||
- [[content-based-reasoning]] — Mamba 识别并解决的核心弱点
|
||||
|
||||
## 参考
|
||||
|
||||
- 代码:https://github.com/state-spaces/mamba
|
||||
- S4 (Gu et al., 2022)
|
||||
- H3 (Dao et al., 2023)
|
||||
- 选择复制任务 (Arjovsky et al., 2016)
|
||||
- 归纳头 (Olsson et al., 2022)
|
||||
43
raw/papers/hazare-dcgwm-2026.md
Normal file
43
raw/papers/hazare-dcgwm-2026.md
Normal file
@@ -0,0 +1,43 @@
|
||||
---
|
||||
title: "Dual-Channel Grounded World Modeling (DCGWM)"
|
||||
source_id: "arXiv:2606.18688v1"
|
||||
authors:
|
||||
- "Akshay Hazare"
|
||||
affiliations: "Independent Researcher"
|
||||
date: 2026-06-17
|
||||
categories: ["cs.LG", "cs.AI"]
|
||||
note: "Position paper. Experimental validation in progress."
|
||||
url: "https://arxiv.org/abs/2606.18688v1"
|
||||
---
|
||||
|
||||
# Dual-Channel Grounded World Modeling (DCGWM)
|
||||
|
||||
**Authors**: Akshay Hazare (Independent)
|
||||
**arXiv**: 2606.18688v1 | **Date**: 2026-06-17
|
||||
**Categories**: cs.LG, cs.AI
|
||||
**Position paper — experimental validation ongoing**
|
||||
|
||||
## Abstract
|
||||
|
||||
Joint Embedding Predictive Architectures (JEPAs) are a leading approach to world model representation learning. We identify a failure mode in JEPA-based world models grounded against two qualitatively distinct external signals: physical dynamics (sparse, high-magnitude, constraint-satisfying gradient corrections) and social-behavioral dynamics (diffuse, distribution-matching corrections). We term this **Objective Interference Collapse (OIC)**: joint learning in a shared latent space causes the dominant channel to systematically collapse the subordinate channel's representational subspace, in a manner not resolvable by loss weighting alone.
|
||||
|
||||
We propose **Dual-Channel Grounded World Modeling (DCGWM)**, designed to structurally prevent OIC through a partitioned latent space (Z_p ⊕ Z_b) with inward-only gradient flow. The Physical Grounding Channel updates only Z_p via VICReg-style alignment; the Social-Behavioral Grounding Channel updates only Z_b via alignment to emergent multi-agent simulation trajectories. An Inter-Channel Interface Module couples subspaces at the task level without cross-subspace gradients. An Asymmetric Grounding Adherence Loss penalizes rollout drift with a hard hinge for physical violations and a soft KL for behavioral divergence. A Generative Rendering Layer is architecturally isolated from the latent world model.
|
||||
|
||||
Three theoretical results: the partition removes the gradient-interference pathway; each grounded subspace inherits anti-collapse guarantees; generative isolation is necessary under stated assumptions.
|
||||
|
||||
## Key Contributions
|
||||
|
||||
1. **Objective Interference Collapse**: Formalization of a new collapse mode — when two grounding signals with incompatible statistical structures share a latent space
|
||||
2. **DCGWM Architecture**: Partitioned latent space + inward-only gradient flow + separated grounding channels
|
||||
3. **Asymmetric Grounding Adherence Loss (L_AGA)**: First loss for rollout drift under heterogeneous grounding with incompatible tolerance structures
|
||||
4. **Isolation Necessity Theorem**: Under assumptions A1-A2, any α > 0 generative gradient causes world model drift
|
||||
5. **LLM World Modeling Critique**: NTP-trained LLMs face inherent subspace collapse that DCGWM avoids by design
|
||||
|
||||
## Key Concepts
|
||||
|
||||
- [[objective-interference-collapse|OIC]] — The new collapse mode this paper identifies
|
||||
- [[dcgwm|DCGWM]] — The architecture
|
||||
- [[inward-only-gradient-flow|Inward-Only Gradient Flow]] — The key separation mechanism
|
||||
- [[asymmetric-grounding-adherence-loss|L_AGA]] — Asymmetric rollout drift penalty
|
||||
- [[rollout-drift|Rollout Drift]] — Multi-step prediction error accumulation
|
||||
- [[isolation-necessity-theorem|Isolation Necessity]] — Formal generative isolation result
|
||||
71
raw/papers/jordan-collectivist-ai-2025.md
Normal file
71
raw/papers/jordan-collectivist-ai-2025.md
Normal file
@@ -0,0 +1,71 @@
|
||||
---
|
||||
title: "A Collectivist, Economic Perspective on AI"
|
||||
author: Michael I. Jordan
|
||||
arxiv_id: "2507.06268"
|
||||
categories: cs.CY, cs.AI, stat.ML
|
||||
date: 2025-07-08
|
||||
updated: 2025-12-15 (v3)
|
||||
url: https://arxiv.org/abs/2507.06268
|
||||
type: paper
|
||||
tags:
|
||||
- ai-economics
|
||||
- collective-intelligence
|
||||
- uncertainty
|
||||
- mechanism-design
|
||||
- foundation-models
|
||||
---
|
||||
|
||||
## 摘要
|
||||
|
||||
信息技术正处于一场革命之中——无处不在的数据收集和机器学习正以前所未有的方式影响人类世界。"智能"一词被用作技术发展的北极星,人类认知被视作基线。这种观点忽略了人类是社会动物这一事实,我们的大部分智能具有社会和文化起源。前路不是更多的数据和计算,也不是更多关注认知或符号表征,而是**在算法设计层面将经济与社会概念与计算和推断概念深度融合**。
|
||||
|
||||
## 核心框架:三种思维方式的融合
|
||||
|
||||
Jordan 提出将三种思维方式融合为 AI 系统设计的新基础:
|
||||
|
||||
```
|
||||
计算思维 (Computational) → 模块化、抽象、规模化
|
||||
推断思维 (Inferential) → 不确定性下的数据收集与预测
|
||||
经济思维 (Economic) → 激励机制、博弈均衡
|
||||
```
|
||||
|
||||
两两融合已形成学科(如算法博弈论),但三者的完整融合才是目标。论文通过若干案例展示这种融合的具体形态。
|
||||
|
||||
## 关键案例
|
||||
|
||||
### 1. 数据库设计中的推断思维(§2)
|
||||
|
||||
传统数据库关注计算(隐私保护、查询优化),但**推断思维**引入了不同的视角:不是对标数据库中的已有患者,而是**对来自同一总体的新患者做出预测并量化不确定性**。这需要生成模型、因果推断("what if"问题)。
|
||||
|
||||
### 2. 统计合同理论(§3)
|
||||
|
||||
[[statistical-contract-theory|统计合同理论]](Bates et al., 2024):将假设检验嵌入经济合同设计。核心发现:在顺序博弈中,合同是激励相容的当且仅当选项可表达为 **[[e-values|E-values]]**——一种在零假设下期望 ≤1 的函数,可视为证据随时间的累积(非负上鞅)。
|
||||
|
||||
### 3. 数据市场(§4.2)
|
||||
|
||||
[[data-markets|三层数据市场]](Fallah et al., 2024):用户→平台→第三方数据买家。核心张力:平台需要在服务收入(来自用户)与数据销售收入(来自买家)之间权衡,同时需向用户提供隐私保证来维持参与。需建模为广义 Stackelberg 博弈求均衡。
|
||||
|
||||
### 4. 基础模型与预测驱动推断(§4.3)
|
||||
|
||||
AlphaFold 案例:在知识边界(量子涨落蛋白)上给出高置信但完全偏倚的预测。[[prediction-driven-inference|预测驱动推断]](PPI)混合少量局部 ground-truth 数据与全局基础模型预测,使置信区间重新覆盖真实值。
|
||||
|
||||
### 5. 概率匹配(附录 C)
|
||||
|
||||
[[probability-matching|概率匹配]]:小鼠迷宫实验——左臂食物是右臂的 2 倍。决策论最优小鼠每次去左边;真实小鼠以 2:1 的概率匹配。在**种群视角**下这是纳什均衡——避免资源浪费,提升社会总福利。这是集体主义不确定性处理的微观范例。
|
||||
|
||||
## 教育启示
|
||||
|
||||
论文附录 B 讨论了 UC Berkeley 的 **Data 8** 课程(Jordan 2015 年参与设计),融合"计算思维 + 推断思维":学生用 Python 直方图和置换检验回答真实世界问题(水质、森林砍伐等)。目前每学期 1500+ 学生,是伯克利历史上增长最快的课程。下一步:加入经济思维。
|
||||
|
||||
## 核心主张
|
||||
|
||||
- LLM 可被理解为**集体主义制品**——每次交互隐含地与数十亿贡献微数据的个体对话
|
||||
- 「AI 匹敌的隐喻不是搜索引擎或聊天机器人,而是**市场**」
|
||||
- 真正成熟的 AI 工程学科需要 Maxwell 方程组级别的**模块化透明设计概念**——当前远未达到
|
||||
- 路径不在于将 AI 狭窄化为人脑模拟,而在于将**经济与推断原则融入算法设计的 DNA**
|
||||
|
||||
## 参考文献
|
||||
|
||||
- Bates et al. (2024). Principal-Agent Hypothesis Testing. arXiv:2205.06812
|
||||
- Angelopoulos et al. (2023). Prediction-Powered Inference. Science 383, 669–674
|
||||
- Fallah et al. (2024). On Three-Layer Data Markets. arXiv:2402.09697
|
||||
19
raw/papers/large-language-gibbs-2026.md
Normal file
19
raw/papers/large-language-gibbs-2026.md
Normal file
@@ -0,0 +1,19 @@
|
||||
# Structured Inference with Large Language Gibbs
|
||||
|
||||
- **arXiv**: 2606.19264v1
|
||||
- **Published**: 2026-06-17
|
||||
- **Authors**: Sanghyeok Choi, Henry Gouk, Esmeralda S. Whitammer (University of Edinburgh, CIFAR)
|
||||
- **Categories**: cs.LG, cs.CL
|
||||
- **Code**: https://github.com/hyeok9855/large-language-gibbs
|
||||
- **Source**: https://arxiv.org/abs/2606.19264
|
||||
|
||||
## Abstract
|
||||
|
||||
Large Language Gibbs 是一种结构化概率推断方案,将 LLM 的条件分布用作 Gibbs 采样的转移算子(transition operator)。核心思想:不通过单次自回归生成结构化对象,而是迭代地根据其他变量重新采样单个变量(利用 LLM 的 next-token conditional)。这种方法避免了生成顺序依赖的偏差,产生的稳态分布反映了所有局部条件之间的折衷。应用于合成分布采样、一致性推理(GSM8K/TruthfulQA)和贝叶斯结构学习。
|
||||
|
||||
## Key Contributions
|
||||
|
||||
1. 将 LLM 条件分布形式化为 Gibbs 采样转移算子,给出稳态分布 q^sym 的理论刻画
|
||||
2. 提出三类核变体:Basic Gibbs(直接条件采样)、Barker Gibbs(偏好比较)、Gambling Gibbs(赌博决策)
|
||||
3. 随机排列策略消除变量顺序偏差
|
||||
4. 三个应用场景验证:采样偏差纠正、一致性推理、因果结构先验
|
||||
21
raw/papers/latent-cot-supervision-2026.md
Normal file
21
raw/papers/latent-cot-supervision-2026.md
Normal file
@@ -0,0 +1,21 @@
|
||||
# What Makes Effective Supervision in Latent Chain-of-Thought: An Information-Theoretic Analysis
|
||||
|
||||
- **arXiv**: 2606.20075v1
|
||||
- **Published**: 2026-06-18
|
||||
- **Authors**: Xinghao Chen, Chak Tou Leong, Wenjin Guo, Jian Wang, Wenjie Li, Xiaoyu Shen (Eastern Institute of Technology / Hong Kong Polytechnic University)
|
||||
- **Categories**: cs.LG, cs.CL
|
||||
- **Venue**: ICML 2026
|
||||
- **Code**: https://github.com/EIT-NLP/Supervision-in-Latent-CoT
|
||||
- **Source**: https://arxiv.org/abs/2606.20075
|
||||
|
||||
## Abstract
|
||||
|
||||
从信息论角度分析 Latent Chain-of-Thought 的有效监督机制。识别出 outcome supervision 的"双重崩溃"——梯度衰减和表示漂移。将过程监督分解为两个互补维度:Trajectory Supervision(注入密集逐步推理信号)和 Space Supervision(通过生成式重建保留潜空间的语义结构)。提出 Unified Latent Probe (ULP) 量化潜轨迹与显式推理步骤之间的互信息。实验揭示 Information-Performance Binding:推理精度严格受限于潜在链中保留的信息保真度。
|
||||
|
||||
## Key Contributions
|
||||
|
||||
1. 信息论分析框架:将 Latent CoT 监督形式化为互信息最大化问题
|
||||
2. 双重崩溃诊断:梯度衰减 + 表征漂移是 outcome supervision 失败的根本原因
|
||||
3. 过程监督的二维分解:Trajectory Supervision × Space Supervision
|
||||
4. ULP 探针:量化潜状态中的可恢复推理信息
|
||||
5. Information-Performance Binding:推理能力严格受限于信息保真度
|
||||
31
raw/papers/longmem-eval-2025.md
Normal file
31
raw/papers/longmem-eval-2025.md
Normal file
@@ -0,0 +1,31 @@
|
||||
---
|
||||
title: "LongMemEval: Benchmarking Long-Term Interactive Memory (Raw Archive)"
|
||||
created: 2026-06-25
|
||||
updated: 2026-06-25
|
||||
type: raw
|
||||
tags: ["memory-benchmark", "chat-assistant", "long-term-memory"]
|
||||
source: "https://arxiv.org/abs/2410.10813"
|
||||
---
|
||||
|
||||
# LongMemEval — Raw Archive
|
||||
|
||||
## Metadata
|
||||
|
||||
- **Title**: LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory
|
||||
- **Authors**: Di Wu (UCLA), Hongwei Wang, Wenhao Yu (Tencent AI Lab Seattle), Yuwei Zhang (UC San Diego), Kai-Wei Chang (UCLA), Dong Yu (Tencent AI Lab Seattle)
|
||||
- **Venue**: ICLR 2025
|
||||
- **arXiv**: 2410.10813
|
||||
- **Date**: 2024-10-14 (v1), 2025-03-04 (v2)
|
||||
- **Category**: cs.CL
|
||||
- **Code**: https://github.com/xiaowu0162/LongMemEval
|
||||
|
||||
## Abstract
|
||||
|
||||
Recent large language model (LLM)-driven chat assistant systems have integrated memory components to track user-assistant chat histories, enabling more accurate and personalized responses. However, their long-term memory capabilities in sustained interactions remain underexplored. We introduce LongMemEval, a comprehensive benchmark designed to evaluate five core long-term memory abilities of chat assistants: information extraction, multi-session reasoning, temporal reasoning, knowledge updates, and abstention. With 500 meticulously curated questions embedded within freely scalable user-assistant chat histories, LongMemEval presents a significant challenge to existing long-term memory systems, with commercial chat assistants and long-context LLMs showing a 30% accuracy drop on memorizing information across sustained interactions. We then present a unified framework that breaks down the long-term memory design into three stages: indexing, retrieval, and reading.
|
||||
|
||||
## Key Contributions
|
||||
|
||||
1. First comprehensive memory benchmark featuring 5 core abilities + abstention
|
||||
2. Unified three-stage memory framework (indexing → retrieval → reading) with four control points
|
||||
3. Empirically validated design optimizations: round granularity, fact-augmented keys, time-aware query expansion
|
||||
4. Two standard settings: S (~115k tokens) and M (~1.5M tokens)
|
||||
73
raw/papers/maineCoon-social-world-model-2026.md
Normal file
73
raw/papers/maineCoon-social-world-model-2026.md
Normal file
@@ -0,0 +1,73 @@
|
||||
---
|
||||
title: "MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model"
|
||||
created: 2026-06-20
|
||||
source: "arXiv:2606.17800"
|
||||
authors: "Lichen Bai, Tianhao Zhang, Shitong Shao, Dingwei Tan, Qiyu Zhong, Zhengpeng Xie, Haopeng Li, Qinghao Huang, Dandan Shen, Tengjiao Ji, Wei Wang, Peicheng Wu, Yuxuan Zhao, Xiangyu Zhu, Welly Luo, Shurui Yang, Zeke Xie"
|
||||
venue: "arXiv preprint (cs.CV)"
|
||||
date: "2026-06-16"
|
||||
project: "https://mainecoon.tech/"
|
||||
type: paper
|
||||
---
|
||||
|
||||
# MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model
|
||||
|
||||
**Catnip AI Team** · arXiv:2606.17800 · 32 pages, 13 figures, 3 tables
|
||||
|
||||
## Abstract
|
||||
|
||||
As an increasing majority of global video content is consumed on social platforms for interactive social purposes, video generation models built for social worlds are important but largely overlooked. We define the position of **social world models** and build MaineCoon as the first step — a 22B real-time audio-visual autoregressive model capable of streaming generation and sub-second interaction at up to **47.5 FPS** on a single GPU.
|
||||
|
||||
Key innovations:
|
||||
- **Self-resampling**: exposes model to degraded self-history during training
|
||||
- **Cross-modal representation alignment**: token relation distillation with V-JEPA 2
|
||||
- **Domain-aware preference optimization**: multi-domain LoRA DPO experts
|
||||
- **Reinforced online-policy distillation (ROPD)**: consolidates domain experts into one deployable policy
|
||||
- **Agentic streaming inference**: training-free framework with planner/observer, cache manager, buffer controller
|
||||
|
||||
MaineCoon supports thousand-second-scale generation while mitigating drift, and sets SOTA on the new **SocialVideo Bench** (9 evaluation metrics).
|
||||
|
||||
## 核心问题
|
||||
|
||||
全球大多数视频在社交平台上被消费,但现有视频生成模型(如 DiT 扩散模型)存在三大局限:
|
||||
1. **离线非流式**:双向时间注意力导致无法实时输出
|
||||
2. **忽略音频**:社交视频的语音、唇音同步、情感共鸣是关键
|
||||
3. **缺乏长时稳定性**:分钟级自回归生成的内容漂移
|
||||
|
||||
## 方法论
|
||||
|
||||
### Training Pipeline (Section 3)
|
||||
- **Native Streaming AR Training (3.1)**: 因果逐块自回归训练,通过 [[self-resampling|Self-Resampling]] 让模型适应自身产生的退化历史
|
||||
- **Cross-modal Representation Alignment (3.2)**: 利用 [[jepa|V-JEPA 2]] teacher 的 token relation distillation 加速训练
|
||||
- **Post-training (3.3)**: [[domain-aware-preference-optimization|Domain-Aware DPO]] 训练域专家,[[reinforced-online-policy-distillation|ROPD]] 将专家合并为单一策略
|
||||
- **Step Distillation**: DMD-based 四步蒸馏,实现近乎无损的快速推理
|
||||
|
||||
### Agentic Streaming Inference (Section 4)
|
||||
训练无关的推理框架,三个控制器包裹冻结生成器:
|
||||
- **[[agentic-streaming-inference|Director]] (Planner & Observer)**: Gemma 4 26B agent 写 prompt 流 + 观察生成质量
|
||||
- **[[agentic-cache-manager|Cache Manager]]**: 管理 KV-cache 的 keep-set + drift control
|
||||
- **[[look-ahead-buffer-controller|Buffer Controller]]**: 控制生成与播放之间的 lead
|
||||
|
||||
### Data Pipeline (Section 2)
|
||||
- Synthetic data via LTX-2.3 teacher + director-style LM scenario planning (225 scenes × 15 styles × 12 shots)
|
||||
- Real social video curation: SCRFD face detection → SyncNet lip-sync verification → quality filtering
|
||||
- 日处理能力:十万视频规模
|
||||
|
||||
## 关键结果
|
||||
|
||||
- **47.5 FPS** on single H100 GPU
|
||||
- **<$0.001 per second** generation cost
|
||||
- **45 minutes** continuous streaming without measurable degradation
|
||||
- SOTA on SocialVideo Bench across 9 metrics vs. 7 open-source baselines
|
||||
- 训练效率:<10K GPU hours, <1M clips
|
||||
|
||||
## 相关概念
|
||||
- [[social-world-model|社交世界模型]]
|
||||
- [[self-resampling|自重采样]]
|
||||
- [[reinforced-online-policy-distillation|ROPD]]
|
||||
- [[agentic-streaming-inference|Agentic 流式推理]]
|
||||
- [[agentic-cache-manager|Agentic 缓存管理]]
|
||||
- [[look-ahead-buffer-controller|先行缓冲控制]]
|
||||
- [[forward-repair-ladder|前向修复阶梯]]
|
||||
- [[socialvideo-bench|SocialVideo Bench]]
|
||||
- [[audio-visual-representation-alignment|音视频表示对齐]]
|
||||
- [[domain-aware-preference-optimization|域感知偏好优化]]
|
||||
40
raw/papers/me2-trm-reasoning-2026.md
Normal file
40
raw/papers/me2-trm-reasoning-2026.md
Normal file
@@ -0,0 +1,40 @@
|
||||
---
|
||||
title: "Characterizing, Evaluating, and Optimizing Complex Reasoning (ME² + TRM)"
|
||||
author: "Haoran Zhang, Yafu Li, Zhi Wang, Zhilin Wang, Shunkai Zhang, Xiaoye Qu, Yu Cheng"
|
||||
source: "arXiv 2602.08498v2"
|
||||
date: "2026-02-09 (updated 2026-06-03)"
|
||||
type: paper
|
||||
venue: "ICML 2026 (cs.CL)"
|
||||
tags: ["reasoning", "reward-model", "dag", "grpo", "test-time-scaling", "rl"]
|
||||
code: "https://github.com/Simplified-Reasoning/TRM"
|
||||
---
|
||||
|
||||
# Characterizing, Evaluating, and Optimizing Complex Reasoning
|
||||
|
||||
> Zhang, Li, Wang, Wang, Zhang, Qu, Cheng | SJTU / Shanghai AI Lab / CUHK / NJU / USTC / PKU
|
||||
> ICML 2026 | arXiv:2602.08498v2 | cs.CL
|
||||
|
||||
## 三个核心问题
|
||||
|
||||
1. **Q1**:什么定义了高质量推理?
|
||||
2. **Q2**:如何可靠评估长且隐式结构化的推理轨迹?
|
||||
3. **Q3**:如何将此评估信号用于推理优化?
|
||||
|
||||
## 核心方案
|
||||
|
||||
### ME² 原则
|
||||
沿两个正交轴表征推理质量:
|
||||
- **Macro vs Micro**:全局结构组织 vs 局部步骤属性
|
||||
- **Effectiveness vs Efficiency**:有效性 vs 效率
|
||||
|
||||
### DAG 推理建模
|
||||
将推理轨迹抽象为有向无环图(DAG),显式建模推进、分支和合并。DAG 是树和完全图的实用折衷——捕获丰富结构,同时保持与生成顺序一致的拓扑排序。
|
||||
|
||||
### Thinking Reward Model (TRM)
|
||||
- 基于 ME² + DAG pairwise evaluation 构建 TRM-Preference 数据集(103K 训练对)
|
||||
- 用 Bradley-Terry 目标训练轻量 TRM(Llama-3.1-8B → scalar head)
|
||||
- 关键:TRM 仅训练于 verified-correct reasoning 偏好对,与答案正确性监督解耦
|
||||
|
||||
### 优化信号
|
||||
- Test-time:Best-of-N selection → +19.3%(AIME24, Qwen3-8B)
|
||||
- Training:TRM-guided GRPO with gated reward shaping → +3.9%
|
||||
41
raw/papers/mozer-topological-trouble-transformers-2026.md
Normal file
41
raw/papers/mozer-topological-trouble-transformers-2026.md
Normal file
@@ -0,0 +1,41 @@
|
||||
---
|
||||
title: "The Topological Trouble With Transformers"
|
||||
source: arXiv
|
||||
source_id: 2604.17121
|
||||
authors:
|
||||
- Michael C. Mozer (Google DeepMind)
|
||||
- Shoaib Ahmed Siddiqui (Google DeepMind)
|
||||
- Rosanne Liu (Google DeepMind)
|
||||
published: 2026-04-18
|
||||
updated: 2026-06-03
|
||||
categories:
|
||||
- cs.LG
|
||||
- cs.AI
|
||||
venue: Preprint
|
||||
---
|
||||
|
||||
# The Topological Trouble With Transformers
|
||||
|
||||
## Abstract
|
||||
Transformers encode structure in sequences via an expanding contextual history. However, their purely feedforward architecture fundamentally limits dynamic state tracking. State tracking—the iterative updating of latent variables reflecting an evolving environment—involves inherently sequential dependencies that feedforward networks struggle to maintain. Consequently, feedforward models push evolving state representations deeper into their layer stack with each new input step, rendering information inaccessible in shallow layers and ultimately exhausting the model's depth.
|
||||
|
||||
While this depth limit can be bypassed by dynamic depth models and by explicit or latent thinking that externalizes state representations, these solutions are computationally and memory inefficient. The authors argue that temporally extended cognition requires refocusing from explicit thought traces to implicit activation dynamics via recurrent architectures.
|
||||
|
||||
## Core Contributions
|
||||
1. **Topological analysis** of why feedforward Transformers fundamentally cannot track state indefinitely
|
||||
2. **Taxonomy of recurrent Transformer architectures** along two dimensions: recurrence axis (depth vs step) and input-tokens-per-recurrence-step ratio
|
||||
3. **Identification of empty cells** in the taxonomy as promising research directions
|
||||
4. **Critique of Chain-of-Thought as workaround** — it externalizes what should be implicit
|
||||
5. **Roadmap** for enhanced SSMs, coarse recurrence, representational alignment, and efficient recurrence training
|
||||
|
||||
## Key Concepts
|
||||
- state tracking, belief state, depth dilemma
|
||||
- recurrent transformer architectures (depth/step/both)
|
||||
- recurrence taxonomy: axis × ratio
|
||||
- attractor dynamics, latent thought models
|
||||
- enhanced state-space models (DeltaNet, RWKV-7, PaTH attention)
|
||||
- representational alignment, coarse-grained recurrence
|
||||
- sequential dependency, autoregressive unrolling
|
||||
|
||||
## URL
|
||||
https://arxiv.org/abs/2604.17121
|
||||
90
raw/papers/peng-rwkv7-goose-2025.md
Normal file
90
raw/papers/peng-rwkv7-goose-2025.md
Normal file
@@ -0,0 +1,90 @@
|
||||
---
|
||||
title: "RWKV-7 \"Goose\" with Expressive Dynamic State Evolution"
|
||||
authors: ["Bo Peng", "Ruichong Zhang", "Daniel Goldstein", "Eric Alcaide", "et al."]
|
||||
date: 2025-03-18
|
||||
arxiv_id: "2503.14456v2"
|
||||
categories: ["cs.CL", "cs.AI", "cs.LG"]
|
||||
affiliations: ["RWKV Project (Linux Foundation AI & Data)", "EleutherAI", "Tsinghua University", "et al."]
|
||||
paper_type: "preprint"
|
||||
code: "https://github.com/RWKV/RWKV-LM"
|
||||
models: "https://huggingface.co/RWKV"
|
||||
---
|
||||
|
||||
# RWKV-7 "Goose" with Expressive Dynamic State Evolution
|
||||
|
||||
## 摘要
|
||||
|
||||
RWKV-7 "Goose" 是一种新序列建模架构,具有常数内存使用和常数每 token 推理时间。尽管训练 token 数远少于同类顶级模型,其 2.9B 参数语言模型在多语言任务上达到新的 3B SoTA,在英语下游性能上匹敌当前 3B SoTA。RWKV-7 核心创新:(1) 广义化的 delta 规则——带**向量值门控**和**上下文学习率**;(2) 松弛值替换规则(解耦移除和添加的 key)。理论上,RWKV-7 可执行状态追踪并识别**所有正则语言**,超越 Transformer 的 TC^0 限制。附带发布了 3.1T token 多语言语料和四个预训练模型(0.19B-2.9B),全部 Apache 2.0。
|
||||
|
||||
## 核心贡献
|
||||
|
||||
1. **广义 Delta 规则**:将 DeltaNet 的标量 delta 规则扩展到向量值门控和上下文学习率
|
||||
2. **松弛值替换规则**:解耦移除 key(k_remove)和添加 key(k_add),允许更灵活的状态更新
|
||||
3. **超越 TC^0 的表达力**:证明 RWKV-7 可识别所有正则语言(NC^1),单层即可解决 S5 状态追踪
|
||||
4. **模型升级方法**:从 RWKV-5/6 checkpoint 升级训练而非从头 pretrain,节省计算
|
||||
5. **RWKV World v3 数据集**:3.1T token 多语言开放语料
|
||||
|
||||
## 方法框架
|
||||
|
||||
### 从 DeltaNet 到广义 Delta Rule
|
||||
|
||||
传统 Delta 规则(DeltaNet)的形式:
|
||||
```
|
||||
S_t = S_{t-1} - α · ∇l(S_{t-1}, k_t, v_t)
|
||||
```
|
||||
|
||||
RWKV-7 的广义 Delta 规则引入三个创新:
|
||||
|
||||
**1. 向量值门控(Vector-valued Gating)**:
|
||||
```
|
||||
S_t = S_{t-1} · (diag(w_t) - κ̂_t^T (a_t ⊙ κ̂_t)) + v_t^T · k_t
|
||||
```
|
||||
其中 w_t 是动态衰减(flexible decay),a_t 是向量值上下文学习率,κ̂_t 是归一化的 key。
|
||||
|
||||
**2. 向量值上下文学习率(in-context learning rate)**:
|
||||
a_t 从标量升级为向量(d 维),允许模型**逐通道**选择性替换状态数据。
|
||||
|
||||
**3. 广义特征值(Generalized Eigenvalue)**:
|
||||
进化矩阵可拥有 [0, 1] 区间外的特征值 → 表达能力超越标准 SSM。
|
||||
|
||||
### 与各架构对比
|
||||
|
||||
| 架构 | 大状态 | 灵活衰减 | 动态依赖 | 广义特征值 |
|
||||
|------|--------|---------|---------|----------|
|
||||
| RWKV-4 | ✗ | ✗ | ✗ | ✗ |
|
||||
| Mamba | ✗ | ✓ | ✓ | ✗ |
|
||||
| RWKV-6 / GLA | ✗ | ✓ | ✓ | ✗ |
|
||||
| Gated DeltaNet | ✓ | ✗ | ✓ | ✓ |
|
||||
| **RWKV-7** | ✓ | ✓ | ✓ | ✓ |
|
||||
|
||||
### 理论突破
|
||||
|
||||
RWKV-7 是**首个被证明超越 TC^0** 的并行化可训练 RNN 架构(在 TC^0 ≠ NC^1 猜想下):
|
||||
|
||||
- 单层可解决 S5 状态追踪(NC^1 问题)
|
||||
- 常数层可识别任意正则语言
|
||||
- Transformer(standard)被限制在 TC^0
|
||||
|
||||
## 实验结果
|
||||
|
||||
- **2.9B 多语言**:3B 规模多语言 SoTA,英语匹敌当前 3B SoTA
|
||||
- **训练效率**:训练 token 远少于同等规模模型
|
||||
- **长上下文**:常数内存,推理成本不随序列长度增长
|
||||
- **关联回忆(Associative Recall)**:在合成任务上显著优于 RWKV-6
|
||||
|
||||
## 关键概念
|
||||
|
||||
- [[delta-rule]] → [[generalized-delta-rule]] — Delta 规则的演进路径
|
||||
- [[vector-valued-gating]] — RWKV-7 的向量值门控机制
|
||||
- [[in-context-learning-rate]] — 逐通道上下文学习率
|
||||
- [[dynamic-state-evolution]] — 动态状态演化机制
|
||||
- [[token-shift]] — RWKV 家族的时间混合技巧
|
||||
- [[regular-language-recognition]] — 理论突破:识别所有正则语言
|
||||
- [[wkv-time-mixing]] — RWKV-7 的 WKV 时间混合机制
|
||||
|
||||
## 参考
|
||||
|
||||
- 代码:https://github.com/RWKV/RWKV-LM
|
||||
- 模型:https://huggingface.co/RWKV
|
||||
- DeltaNet (Schlag et al., 2021)
|
||||
- RWKV-6 / Finch (Peng et al., 2024)
|
||||
40
raw/papers/personalization-trap-2025.md
Normal file
40
raw/papers/personalization-trap-2025.md
Normal file
@@ -0,0 +1,40 @@
|
||||
---
|
||||
title: "The Personalization Trap: How User Memory Alters Emotional Reasoning in LLMs"
|
||||
author: "Xi Fang*, Weijie Xu*, Yuchong Zhang, Stephanie Eckman, Scott Nickleach, Chandan K. Reddy (Amazon)"
|
||||
source: "arXiv 2510.09905v2"
|
||||
date: "2025-10-10 (updated 2026-06-16)"
|
||||
type: paper
|
||||
venue: "arXiv (cs.AI, cs.CL)"
|
||||
tags: ["personalization", "memory", "emotional-intelligence", "bias", "social-capital", "dpo"]
|
||||
code: "https://github.com/personalization-trap"
|
||||
dataset: "Datasets Repository"
|
||||
---
|
||||
|
||||
# The Personalization Trap: How User Memory Alters Emotional Reasoning in LLMs
|
||||
|
||||
> Xi Fang*, Weijie Xu*, Yuchong Zhang, Stephanie Eckman, Scott Nickleach, Chandan K. Reddy
|
||||
> Amazon | arXiv:2510.09905v2 | cs.AI / cs.CL
|
||||
|
||||
## 核心问题
|
||||
|
||||
当 AI 助手记得"Sarah 是打两份工的单亲妈妈"时,它对她压力的解读是否会不同于"Sarah 是富有的高管"?个性化 AI 系统越来越多地融入长期用户记忆,但这如何影响情感推理尚未被研究。
|
||||
|
||||
## 方法
|
||||
|
||||
1. **用户画像生成**:基于 Bourdieu 社会资本框架,30 个基础画像各生成 advantaged/disadvantaged 两个版本 + 81 个交叉性画像(性别×年龄×宗教×种族)
|
||||
2. **情感理解评估**:STEU(42 个情感识别场景)+ 改良 STEM(44 个第一人称情感建议场景),经人类专家验证去除画像敏感题目
|
||||
3. **统计建模**:混合效应模型估算人口统计学效应
|
||||
|
||||
## 关键发现
|
||||
|
||||
**发现 1**:用户记忆系统性影响情感理解。15 个模型中 11 个显著偏离无记忆基线。Claude 3.7 Sonnet:优势画像 80.10% vs 劣势画像 77.37%(p<0.05)。
|
||||
|
||||
**发现 2**:人口统计学偏见显著。穆斯林、非二元性别、65+ 画像得分偏低。Claude 3.7 对女性/非二元性别的情绪建议显著差于男性。但偏见方向因模型而异——无统一模式。
|
||||
|
||||
**发现 3**:"thinking" 模型偏见低于标准版本,但偏见在情绪建议任务中持续存在。
|
||||
|
||||
**发现 4**:通过 DPO 在精心策划的偏好数据集上训练(500 样本),可减少偏见影响同时保持通用能力。Gemma-2-2B 的 Bias Influence 从 5.50% 降至 -2.30%。
|
||||
|
||||
## 核心洞察
|
||||
|
||||
"记住你是谁的记忆,绝不应该决定它有多在乎你"——个性化可能在不经意间将社会等级编码进 AI 的情感推理。
|
||||
59
raw/papers/tang-lukv-2026.md
Normal file
59
raw/papers/tang-lukv-2026.md
Normal file
@@ -0,0 +1,59 @@
|
||||
---
|
||||
title: "Predicting Future Utility: Global Combinatorial Optimization for Task-Agnostic KV Cache Eviction"
|
||||
authors: ["Ziyao Tang", "Pengkun Jiao", "Xinhang Chen", "Wei Liu", "Shiyong Li", "Jingjing Chen"]
|
||||
date: 2026-02-09
|
||||
arxiv_id: "2602.08585v2"
|
||||
categories: ["cs.LG", "cs.AI"]
|
||||
venue: "ICML 2026"
|
||||
affiliations: ["Fudan University", "Baidu Inc. (Baige AI Team)"]
|
||||
paper_type: "conference"
|
||||
---
|
||||
|
||||
# Predicting Future Utility: Global Combinatorial Optimization for Task-Agnostic KV Cache Eviction
|
||||
|
||||
## 摘要
|
||||
|
||||
KV cache 的线性内存增长是大模型长上下文推理的核心瓶颈。现有 KV cache eviction 方法依赖瞬时启发式指标(instantaneous heuristic metrics),假设注意力分数在所有 head 中都是一致的重要性代理。然而,不同 attention head 在预测保真度(predictive fidelity)上存在异质性:某些 head 侧重即时贡献,另一些则捕捉长期效用(long-horizon utility)。本文提出 LU-KV 框架,将 head 级别预算分配建模为全局组合优化问题,通过凸包松弛(convex-hull relaxation)和边际效用贪心求解器获得近优解,并设计离线 profiling 协议支持实际部署。在 LongBench 和 RULER 上以 80% KV cache 压缩率实现最小性能损失。
|
||||
|
||||
## 核心贡献
|
||||
|
||||
1. 识别了启发式重要性指标与长视界边际效用之间的关键差距(optimality gap)
|
||||
2. 将预算分配形式化为长期效用最大化问题,提出凸包松弛 + 边际效用贪心求解器
|
||||
3. 设计了数据驱动的离线 profiling 协议,使理论优化可在实际推理中部署
|
||||
4. 指标无关(metric-agnostic):可适配 SnapKV、KeyDiff、CAKE、KVZip 等多种 intra-head 评分方法
|
||||
|
||||
## 关键概念
|
||||
|
||||
- [[oracle-importance]]:Oracle 重要性,基于未来解码窗口中 token 对输出向量的最大潜在贡献
|
||||
- [[optimality-gap]]:启发式指标与 Oracle 指标之间的最优性差距
|
||||
- [[long-horizon-utility]]:长视界效用,区别于瞬时注意力分数
|
||||
- [[global-combinatorial-optimization]]:全局预算分配的组合优化形式化
|
||||
- [[convex-hull-relaxation]]:通过 PAVA 等保序回归方法对离散损失序列做凸松弛
|
||||
- [[marginal-utility]]:边际效用,用于驱动贪心分配策略
|
||||
- [[offline-profiling]]:合成上下文 → Oracle 计算 → Profile 聚合的三阶段离线校准
|
||||
|
||||
## 实验结果
|
||||
|
||||
- LongBench:80% 压缩率下,LU-KV 在 Llama-3.1-8B、Mistral-7B、Qwen2.5-32B 上全面优于 Uniform、PyramidKV、AdaKV 等基线
|
||||
- RULER:在 4K-128K 扩展上下文窗口下保持检索鲁棒性
|
||||
- 离线 profile 在不同任务间具有高度一致的迁移性(transferability)
|
||||
- 可兼容 SnapKV、KeyDiff、CAKE、KVZip 等多种 intra-head 指标
|
||||
|
||||
## 方法框架
|
||||
|
||||
LU-KV 采用两阶段范式:
|
||||
1. **Intra-head scoring**:使用任意启发式指标 π 对 token 评分排序
|
||||
2. **Cross-head budget allocation**:通过全局组合优化确定每个 head 的最优预算 b_{ℓ,h}
|
||||
|
||||
核心分解:`Eviction Loss = Oracle Metric Loss + Optimality Gap Loss`
|
||||
|
||||
## 参考文献
|
||||
|
||||
- SnapKV (Li et al., 2024)
|
||||
- H2O (Zhang et al., 2023)
|
||||
- PyramidKV (Cai et al., 2024)
|
||||
- AdaKV (Feng et al., 2026b)
|
||||
- KeyDiff (Park et al., 2025)
|
||||
- CriticalKV (Feng et al., 2025)
|
||||
- KVZip (Kim et al., 2026)
|
||||
- CAKE (Qin et al., 2025)
|
||||
45
raw/papers/unlimited-ocr-works-2026.md
Normal file
45
raw/papers/unlimited-ocr-works-2026.md
Normal file
@@ -0,0 +1,45 @@
|
||||
---
|
||||
title: "Unlimited OCR Works: Welcome the Era of One-shot Long-horizon Parsing"
|
||||
author: "Youyang Yin, Huanhuan Liu*, YY†, et al. (Baidu Inc.)"
|
||||
source: "arXiv 2606.23050"
|
||||
date: "2026-06-22"
|
||||
type: paper
|
||||
venue: "arXiv (cs.CV, cs.CL)"
|
||||
tags: ["ocr", "attention-mechanism", "long-horizon", "kv-cache", "r-swa", "end-to-end"]
|
||||
code: "https://github.com/baidu/Unlimited-OCR"
|
||||
---
|
||||
|
||||
# Unlimited OCR Works
|
||||
|
||||
> Youyang Yin, Huanhuan Liu*, YY†, Qunyi Xie, Chaorun Liu, Shiqi Yang, Shaohua Wang, Zhanlong Liu, Hao Zou, Jinyue Chen, Shu Wei, Jingjing Wu, Mingxin Huang, Zhen Wu, Guibin Wang, Tengyu Du, Lei Jia
|
||||
> Baidu Inc. | arXiv:2606.23050 | Jun 2026
|
||||
|
||||
## 核心问题
|
||||
|
||||
现有端到端 OCR 模型(如 DeepSeek OCR)用 LLM 作解码器,利用语言先验提升精度,但代价是输出序列增长导致 KV cache 线性膨胀,推理速度持续下降。人类在长程抄写任务中效率不降,这是一个根本性的架构瓶颈。
|
||||
|
||||
## 核心方案:Reference Sliding Window Attention (R-SWA)
|
||||
|
||||
提出 **R-SWA** — 一种模仿人类解析工作记忆的注意力机制:
|
||||
|
||||
1. 每个生成的 token 关注全部参考 token(视觉 token + prompt)+ 前 n 个输出 token(默认 n=128)
|
||||
2. 参考 token 不参与状态转移,避免视觉特征逐渐模糊
|
||||
3. KV cache 保持恒定大小 Lm + n,不随解码长度增长
|
||||
4. 整个解码过程推理速度(TPS)和 GPU 内存恒定
|
||||
|
||||
## 关键结果
|
||||
|
||||
- 以 DeepSeek OCR 为基线,替换所有 decoder attention 为 R-SWA
|
||||
- OmniDocBench v1.5:**93% Overall**,比 DeepSeek OCR 基线高 6pp
|
||||
- OmniDocBench v1.6:与 SOTA 持平(93.54%)
|
||||
- 长程解析:2-40+ 页书籍,Distinct-n > 96%,Edit Distance < 0.11
|
||||
- 推理效率:6000 token 时 TPS 比 DeepSeek OCR 高 35%
|
||||
- 3B 参数,MoE 架构,激活仅 500M
|
||||
|
||||
## 局限性
|
||||
|
||||
受限于 prefill 长度(当前 32K),不能真正无限解析。短期方向:训练 128K 上下文;长期方向:构建 prefill pool 模拟翻页效果。
|
||||
|
||||
## 泛化性
|
||||
|
||||
R-SWA 是通用的解析注意力机制 — 除 OCR 外,同样适用于 ASR、翻译等基于参考的长程任务。
|
||||
41
raw/papers/vla-jepa-2026.md
Normal file
41
raw/papers/vla-jepa-2026.md
Normal file
@@ -0,0 +1,41 @@
|
||||
---
|
||||
title: "VLA-JEPA: Enhancing VLA with Latent World Model"
|
||||
author: "Jingwen Sun*, Wenyao Zhang*, Zekun Qi, Shaojie Ren, Zezhi Liu, Hanxin Zhu, Guangzhong Sun, Xin Jin†, Zhibo Chen†"
|
||||
source: "arXiv 2602.10098v2"
|
||||
date: "2026-02-10 (updated 2026-02-14)"
|
||||
type: paper
|
||||
venue: "arXiv (cs.RO, cs.CV)"
|
||||
tags: ["vla", "jepa", "world-model", "robot-learning", "pretraining", "latent-action"]
|
||||
code: "https://github.com/ginwind/VLA-JEPA/"
|
||||
---
|
||||
|
||||
# VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model
|
||||
|
||||
> Sun*, Zhang*, Qi, Ren, Liu, Zhu, Sun, Jin†, Chen†
|
||||
> USTC / SJTU / Tsinghua / EIT / UCAS / Nankai | arXiv:2602.10098v2 | cs.RO / cs.CV
|
||||
|
||||
## 核心问题
|
||||
|
||||
当前 VLA 的 latent-action 预训练目标学错了东西:它们锚定在像素变化而非动作相关的状态转移上,导致四种失败模式:
|
||||
1. 像素级目标偏向外观而非动作语义
|
||||
2. 真实视频中相机运动和背景变化主导信号
|
||||
3. 信息泄漏使 latent action 坍缩为捷径(编码未来而非转移动态)
|
||||
4. 多阶段训练流水线复杂且脆弱
|
||||
|
||||
## 核心方案:Leakage-free State Prediction
|
||||
|
||||
VLA-JEPA 将 JEPA 范式引入 VLA 预训练:
|
||||
- Target encoder 从未来帧产生 latent target(仅作监督,永不作为输入)
|
||||
- Student 仅见当前观察
|
||||
- 在 latent space(非 pixel space)预测——天然鲁棒于相机运动和背景变化
|
||||
- 简单两阶段:JEPA 预训练 → Action-head 微调
|
||||
|
||||
架构:Qwen3-VL-2B (VLM backbone) + V-JEPA2 encoder (world model) + Flow-Matching action head
|
||||
|
||||
## 关键结果
|
||||
|
||||
- **LIBERO**:SOTA 平均成功率,4 个 task suite 中 2 个最优
|
||||
- **SimplerEnv**:Google Robot 最高平均成功率,WidowX 第二
|
||||
- **LIBERO-Plus**:7 个扰动维度下的强劲鲁棒性
|
||||
- **数据效率**:使用远少于对比方法的训练数据达到更优性能
|
||||
- **Real-world Franka**:真实机器人验证成功
|
||||
45
raw/papers/vu-fisher-width-2026.md
Normal file
45
raw/papers/vu-fisher-width-2026.md
Normal file
@@ -0,0 +1,45 @@
|
||||
---
|
||||
title: "Fisher Width: A Geometric Measure of Complexity on Statistical Manifolds"
|
||||
source_id: "arXiv:2606.18306v1"
|
||||
authors:
|
||||
- "Vu Khac Ky"
|
||||
affiliations: "Department of Mathematics, FPT University, Vietnam"
|
||||
date: 2026-06-16
|
||||
categories: ["cs.LG", "stat.ML"]
|
||||
pages: 48
|
||||
figures: 3
|
||||
url: "https://arxiv.org/abs/2606.18306v1"
|
||||
---
|
||||
|
||||
# Fisher Width: A Geometric Measure of Complexity on Statistical Manifolds
|
||||
|
||||
**Authors**: Vu Khac Ky (FPT University, Vietnam)
|
||||
**arXiv**: 2606.18306v1 | **Date**: 2026-06-16
|
||||
**Categories**: cs.LG (Machine Learning), stat.ML (Machine Learning)
|
||||
**48 pages, 3 figures**
|
||||
|
||||
## Abstract
|
||||
|
||||
Gaussian width is a central geometric complexity measure in high-dimensional probability, compressed sensing, convex optimization, and learning theory. It quantifies the average extent of a set along random directions, thereby capturing the effective dimension of constraint sets, hypothesis classes, and descent cones. However, this notion is intrinsically Euclidean. Statistical models instead carry a natural Riemannian geometry induced by the Fisher information metric, where directions are scaled according to statistical distinguishability rather than ambient Euclidean length.
|
||||
|
||||
We introduce **Fisher width**, a Fisher-geometric analogue of Gaussian width for statistical manifolds. At a parameter point θ, Fisher width replaces the Euclidean identity by the local metric tensor G(θ)^{1/2}, measuring the Gaussian width of the Fisher-rescaled set. This makes the resulting quantity sensitive to local statistical curvature and invariant under smooth reparameterizations.
|
||||
|
||||
We develop the basic theory of Fisher width, showing that it retains key structural features of Gaussian width, including concentration, metric perturbation stability, and spectral comparison bounds with the Euclidean baseline, while also capturing anisotropic geometric effects invisible to Euclidean measures. As an application, we prove a generalization bound for Fisher-Lipschitz hypothesis classes and propose computable estimators, which we evaluate empirically on MNIST across three model classes.
|
||||
|
||||
Fisher width is to statistical manifolds what Gaussian width is to Euclidean convex bodies. This work lays the foundation for studying complexity and learning on curved statistical manifolds.
|
||||
|
||||
## Key Contributions
|
||||
|
||||
1. **Fisher Width Definition**: Introduces Fisher width as a local Fisher-geometric analogue of Gaussian width, with the lifting identity w_G(T;θ) = w(G(θ)^{1/2} T) and reparameterization invariance.
|
||||
2. **Structural Theory**: Concentration inequalities, algebraic properties, spectral comparison bounds, and stability under metric perturbations.
|
||||
3. **Generalization Bound**: For Fisher-Lipschitz hypothesis classes, uniform deviation controlled by w_G(T−T;θ₀)/√n, with tightness proof for exponential-family models.
|
||||
4. **Practical Estimators**: Empirical Fisher, randomized low-rank approximation, and score-based sampling, validated on MNIST (logistic/softmax/ridge regression).
|
||||
|
||||
## Key Concepts
|
||||
|
||||
- [[gaussian-width|Gaussian Width]] — Euclidean foundational complexity measure
|
||||
- [[statistical-manifold|Statistical Manifold]] — Riemannian manifold with Fisher metric
|
||||
- [[fisher-information-metric|Fisher Information Metric]] — Local metric tensor G(θ)
|
||||
- [[fisher-lipschitz|Fisher-Lipschitz]] — Hypothesis class with Fisher-geometric smoothness
|
||||
- [[lifting-identity|Lifting Identity]] — w_G(T;θ) = w(G(θ)^{1/2} T)
|
||||
- [[empirical-fisher|Empirical Fisher]] — Score-based computation of Fisher information
|
||||
18
raw/papers/wan-streamer-2026.md
Normal file
18
raw/papers/wan-streamer-2026.md
Normal file
@@ -0,0 +1,18 @@
|
||||
# Wan-Streamer v0.1: End-to-end Real-time Interactive Foundation Models
|
||||
|
||||
- **arXiv**: 2606.25041
|
||||
- **Published**: 2026-06-23
|
||||
- **Authors**: Lianghua Huang, Zhifan Wu, Wei Wang, Yupeng Shi, Mengyang Feng, Junjie He, Chenwei Xie, Yu Liu, Jingren Zhou, Ang Wang, Bang Zhang, Baole Ai, Chen Liang, Cheng Yu, Chongyang Zhong, Jinwei Qi, Kai Zhu, Pandeng Li, Peng Zhang, Wenyuan Zhang, Xinhua Cheng, Yitong Huang, Yun Zheng, Zoubin Bi (Wan Team, Alibaba Group)
|
||||
- **Categories**: cs.CV, cs.AI, cs.GR, cs.SD
|
||||
- **Website**: https://wan-streamer.com
|
||||
- **Source**: https://arxiv.org/abs/2606.25041
|
||||
|
||||
## Abstract
|
||||
|
||||
Wan-Streamer is a native-streaming, end-to-end interactive foundation model for real-time, low-latency, full-duplex audio-visual interaction. It models language, audio, and video as both input and output within a single Transformer using block-causal attention for incremental streaming. Unlike cascaded systems relying on separate VAD, ASR, language, TTS, audio-driven animation, or video-generation modules, Wan-Streamer jointly learns perception, reasoning, generation, response timing, turn management, and cross-modal synchronization within one unified model, reducing pipeline latency and error accumulation. Streaming units are as short as 160 ms at 25 fps, with ~200 ms model-side response latency and ~550 ms total interaction latency.
|
||||
|
||||
## Key Contributions
|
||||
|
||||
1. End-to-end multimodal interactive foundation model — language, audio, video as both input and output in one Transformer
|
||||
2. Fully causal multimodal architecture: causal audio/video VAEs, causal encoders/decoders, block-causal attention, full-history autoregressive streaming
|
||||
3. Thinker-performer inference pipeline with KV-cache exchange, ~200ms model-side latency, ~550ms total
|
||||
51
raw/papers/yao-ace-router-2026.md
Normal file
51
raw/papers/yao-ace-router-2026.md
Normal file
@@ -0,0 +1,51 @@
|
||||
---
|
||||
title: "ACE-Router: Generalizing History-Aware Routing from MCP Tools to the Agent Web"
|
||||
created: 2026-06-19
|
||||
updated: 2026-06-19
|
||||
type: paper-raw
|
||||
source: https://arxiv.org/abs/2601.08276
|
||||
arxiv_id: 2601.08276
|
||||
version: v2
|
||||
---
|
||||
|
||||
# ACE-Router: Generalizing History-Aware Routing from MCP Tools to the Agent Web
|
||||
|
||||
**Authors**: Zhiyuan Yao (ZJU), Zishan Xu (SJTU), Yifu Guo (SYSU), Zhiguang Han (NTU), Cheng Yang (HDU), Shuo Zhang, Weinan Zhang (SJTU), Xingshan Zeng, Weiwen Liu (Huawei)
|
||||
**Published**: 2026-01-13 (v2: 2026-04-19)
|
||||
**Venue**: arXiv:2601.08276 (cs.AI)
|
||||
**Code**: https://github.com/euyis1019/ACE-Router
|
||||
|
||||
## 核心洞察
|
||||
|
||||
ACE-Router 将 MCP 工具选择重新定义为**训练一个历史感知路由器**的问题——不是用 embedding 做静态匹配,而是让路由器理解多轮对话历史来做上下文感知的精确路由。
|
||||
|
||||
## 三大阶段
|
||||
|
||||
### 1. Candidate Graph + Self-Evolutionary Mutation
|
||||
- 基于语义相似度构建候选图(阈值 τ=0.82)
|
||||
- 五种变异算子:Function Enhancement, Parameter Mutation, Workflow Chaining, Helper Operation, Usage Extension
|
||||
- 627 初始工具 → 2005 工具(通过变异扩展)
|
||||
|
||||
### 2. Trajectory Synthesis(多 Agent 模拟)
|
||||
- 从候选图采样(随机游走 DFS)
|
||||
- Planner Agent + User Agent + Assistant Agent + Tool Agent 四角色模拟
|
||||
- 环境无关设计:无需真实 API,LLM 模拟执行结果
|
||||
- 产出 15,092 个历史感知路由训练样本
|
||||
|
||||
### 3. Light Routing Agent (LRA)
|
||||
- 仅两个工具:router_invoke + tool_execute
|
||||
- 解耦路由决策与任务执行
|
||||
- 可插拔:适配工具路由和 Agent 路由
|
||||
|
||||
## 关键结果
|
||||
|
||||
| 方法 | MCP-Universe | MCP-Mark |
|
||||
|------|:---:|:---:|
|
||||
| Text-Emb-3-Large (Q) | ~40.95% | ~29.89% |
|
||||
| ReAct (Gemini-2.5-Pro) | ~41.80% | ~50.00% |
|
||||
| GPT-4o Router | ~47.41% | ~48.00% |
|
||||
| **ACE-Router (Qwen3-8B)** | **53.44%** | **60.00%** |
|
||||
|
||||
- 扩展候选池:ReAct 41.80→36.47%,ACE-Router 稳定在 53.02%
|
||||
- 噪声环境:GPT-4o 28% / Gemini 32%,ACE-Router 保持 56%
|
||||
- 多 Agent 泛化:无需额外训练,router 直接泛化到 Agent 路由
|
||||
53
raw/papers/zhou-agent-skills-survey-2026.md
Normal file
53
raw/papers/zhou-agent-skills-survey-2026.md
Normal file
@@ -0,0 +1,53 @@
|
||||
---
|
||||
title: "A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications"
|
||||
created: 2026-06-19
|
||||
updated: 2026-06-19
|
||||
type: paper-raw
|
||||
source: https://arxiv.org/abs/2605.07358
|
||||
arxiv_id: 2605.07358
|
||||
version: v3
|
||||
---
|
||||
|
||||
# A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications
|
||||
|
||||
**Authors**: Yingli Zhou, Shu Wang, Yaodong Su, Wenchuan Du, Yixiang Fang, Xuemin Lin
|
||||
**Affiliation**: The Chinese University of Hong Kong, Shenzhen
|
||||
**Published**: 2026-05-08 (v3: 2026-05-26)
|
||||
**Venue**: arXiv:2605.07358 (cs.IR)
|
||||
**Resources**: https://github.com/JayLZhou/Awesome-Agent-Skills
|
||||
|
||||
## Abstract
|
||||
|
||||
LLM-based agents that reason, plan, and act through tools, memory, and structured interaction are emerging as a promising paradigm for automating complex workflows. This survey examines the challenge through the lens of **agent skills**, defined as reusable procedural artifacts that coordinate tools, memory, and runtime context under task-specific constraints. Agents handle high-level reasoning and planning, while skills form the operational layer that enables reliable, reusable, and composable execution.
|
||||
|
||||
The literature is organized around four stages of the agent skill lifecycle: **representation**, **acquisition**, **retrieval**, and **evolution**. The paper also discusses open challenges in quality control, interoperability, safe updating, and long-term capability management.
|
||||
|
||||
## Key Contributions
|
||||
|
||||
1. Identifies agent skills as a foundational component of LLM agent ecosystems, characterizing their role in bridging the **procedural gap** between raw tool access and robust task execution.
|
||||
2. Organizes research around four lifecycle stages with representative methods in each.
|
||||
3. Summarizes agent skills platforms (SkillNet, ClawHub, SkillHub, SkillsMP, Skills.sh), application scenarios, and open challenges.
|
||||
|
||||
## Formal Definition
|
||||
|
||||
A skill is a tuple **S = (M, R, C)**:
|
||||
- **M**: root instruction document
|
||||
- **R**: auxiliary resources (references, templates, scripts)
|
||||
- **C**: applicability conditions (metadata, descriptions, embeddings)
|
||||
|
||||
## Taxonomy at a Glance
|
||||
|
||||
| Stage | Categories |
|
||||
|-------|-----------|
|
||||
| Representation | Text-Based, Code-Backed, Hybrid-Based |
|
||||
| Acquisition | Human-Derived, Experience-Derived, Task-Derived, Corpus-Derived |
|
||||
| Retrieval | Dense Embedding, Sparse/Keyword, Generative, Structure-Aware (Hierarchical + Dependency Graph) |
|
||||
| Selection | Context-Aware, Skill Composition, Cost/Utility-Aware, Feedback-Driven |
|
||||
| Evolution | Skill Revision, Skill Validation, Policy Coupling, Repository Evolution, Runtime Governance |
|
||||
|
||||
## Open Challenges
|
||||
|
||||
- **Acquisition**: Abstraction quality, weak trigger specification, resource drift, admission quality at scale
|
||||
- **Retrieval**: Scalable skill libraries, constraint-aware composition, multi-objective selection, execution-centric evaluation
|
||||
- **Evolution**: Coarse artifact-level evaluation, asymmetric revision (add > rewrite/retire), weakly specified repository governance, confounded gains
|
||||
- **Future**: Unified skill schema, resource-aware joint optimization, lifecycle-level robustness, causality-driven skill diagnosis
|
||||
Reference in New Issue
Block a user