commit dd8345a6eaa13fd997010864bae92b5560499994 Author: Sidney Zhang Date: Mon Apr 20 11:42:41 2026 +0800 20260420:first commit diff --git a/README.md b/README.md new file mode 100644 index 0000000..5d3e580 --- /dev/null +++ b/README.md @@ -0,0 +1,3 @@ +# Wikiplace + +wiki文件档案。SZ的那这些关注点内容。 diff --git a/SCHEMA.md b/SCHEMA.md new file mode 100644 index 0000000..c50825b --- /dev/null +++ b/SCHEMA.md @@ -0,0 +1,115 @@ +# Wiki Schema + +## Domain +跨学科知识库:数学研究、AI/ML 研究、编程语言技术、学习笔记与阅读资料的综合整理。 + +## Conventions +- 文件名:小写,使用连字符,无空格(如 `transformer-architecture.md`, `linear-algebra-basics.md`) +- 每个 wiki 页面必须以 YAML frontmatter 开头(见下方) +- 使用 `[[wikilinks]]` 链接页面(每页至少 2 个出站链接) +- 更新页面时,必须更新 `updated` 日期 +- 每个新页面必须添加到 `index.md` 的正确分类下 +- 每个操作必须追加到 `log.md` +- 数学公式使用 LaTeX 格式:行内 `$...$`,块级 `$$...$$` +- 代码片段使用 fenced code blocks 并标明语言 + +## Frontmatter +```yaml +--- +title: 页面标题 +created: YYYY-MM-DD +updated: YYYY-MM-DD +type: entity | concept | comparison | query | summary | paper | book +tags: [来自下方分类] +sources: [raw/articles/source-name.md] +--- +``` + +## Tag Taxonomy + +### 数学 (Mathematics) +- algebra - 代数 +- analysis - 分析学 +- geometry - 几何 +- topology - 拓扑学 +- number-theory - 数论 +- probability - 概率论 +- statistics - 统计学 +- optimization - 优化理论 +- linear-algebra - 线性代数 +- calculus - 微积分 +- discrete-math - 离散数学 + +### AI/ML +- deep-learning - 深度学习 +- llm - 大语言模型 +- transformer - Transformer 架构 +- neural-network - 神经网络 +- training - 训练方法 +- inference - 推理优化 +- fine-tuning - 微调 +- alignment - 对齐/安全 +- benchmark - 评测基准 +- architecture - 模型架构 +- paper - 学术论文 + +### 编程与技术 +- python - Python 语言 +- rust - Rust 语言 +- javascript - JavaScript/TypeScript +- cpp - C/C++ +- algorithm - 算法 +- system-design - 系统设计 +- concurrency - 并发编程 +- performance - 性能优化 +- tooling - 开发工具 + +### 元标签 (Meta) +- book - 书籍 +- course - 课程 +- tutorial - 教程 +- concept - 概念解释 +- comparison - 对比分析 +- timeline - 时间线 +- person - 人物 +- organization - 组织/公司 +- open-source - 开源项目 +- research - 研究笔记 + +**规则**:每个页面使用的标签必须来自上述分类。如需新标签,先在此添加,再使用。防止标签泛滥。 + +## Page Thresholds +- **创建页面**:当实体/概念出现在 2+ 个来源中,或在一个来源中占据核心地位 +- **添加到现有页面**:当来源提及已有内容时 +- **不创建页面**:仅出现一次的次要细节,或超出领域范围的内容 +- **拆分页面**:当页面超过 ~200 行时——拆分为子主题并交叉链接 +- **归档页面**:当内容完全被取代时——移至 `_archive/`,从索引中移除 + +## Entity Pages(实体页面) +每个值得注意的实体一个页面。包含: +- 概述 / 是什么 +- 关键事实和日期 +- 与其他实体的关系([[wikilinks]]) +- 来源引用 + +## Concept Pages(概念页面) +每个概念或主题一个页面。包含: +- 定义 / 解释 +- 当前知识状态 +- 未解决问题或争议 +- 相关概念([[wikilinks]]) +- 数学推导(如适用) + +## Comparison Pages(对比页面) +并排分析。包含: +- 对比什么以及为什么 +- 对比维度(表格格式优先) +- 结论或综合 +- 来源 + +## Update Policy +当新信息与现有内容冲突时: +1. 检查日期——较新的来源通常优先于较旧的 +2. 如果确实存在矛盾,注明两种观点及日期和来源 +3. 在 frontmatter 中标记:`contradictions: [page-name]` +4. 在 lint 报告中标记供用户审核 diff --git a/articles/oppo-multimodal-data-lake.md b/articles/oppo-multimodal-data-lake.md new file mode 100644 index 0000000..c371b04 --- /dev/null +++ b/articles/oppo-multimodal-data-lake.md @@ -0,0 +1,38 @@ +--- +title: "OPPO 多模态数据湖架构实践" +created: 2026-04-19 +updated: 2026-04-19 +type: summary +tags: [llm, system-design, deep-learning, research] +sources: [raw/articles/oppo-multimodal-data-lake-2026.md] +--- + +# OPPO 多模态数据湖架构实践 + +**来源:** Data for AI Meetup · 2026 +**分享人:** David (OPPO 大数据架构负责人) +**链接:** https://mp.weixin.qq.com/s/cBaYa04qAIGsxG1hD7ll3w + +## 核心背景 +OPPO 的大数据基础设施从离线 Hive/Spark 演进至全模态数据湖阶段,主要服务于三大场景:手机影像算法迭代、多模态推荐搜索、多模态端侧 Agent。数据爆发式增长带来了数据孤岛、元数据混乱和云上 IO 瓶颈等挑战。 + +## 架构设计 (四层模型) + +| 层级 | 技术选型 | 作用 | +|------|----------|------| +| **计算引擎** | Spark + 二开 Lance | 统一全模态数据查询,基于 Lance 8K 开源项目二次开发 | +| **元数据管理** | [[gravitino-unified-metadata]] | 统一 Catalog,支持 Hive 与 Lance 表同目录管理,多云分布,资产全局可感知 | +| **加速层** | [[curvine-distributed-cache]] | 自研云原生分布式缓存,解决 OSS 带宽配额、专线压力及计算节点磁盘闲置问题 | +| **平台产品层** | 数据地图/权限/治理 | 复用现有能力,实现多模态数据资产统一管理 | + +## 关键成果 + +1. **统一元数据**:一套目录同时管理 Hive 和 Lance 表,支持单条 SQL 跨表 JOIN 查询 +2. **控制增量转换存量**:强制所有新增目录通过 Gravitino 访问,逐步收归 PB 级散落算法数据 +3. **Curvine 加速验证**:社区版 LanceDB + Curvine 的向量查询性能达到商业版水平 +4. **多云无感迁移**:混合云架构(自建+阿里云)下,数据分布对业务透明 + +## 相关概念 + +- [[gravitino-unified-metadata]] — Gravitino 统一元数据方案 +- [[curvine-distributed-cache]] — Curvine 分布式缓存系统 diff --git a/concepts/agent-mediated-deception.md b/concepts/agent-mediated-deception.md new file mode 100644 index 0000000..14b387b --- /dev/null +++ b/concepts/agent-mediated-deception.md @@ -0,0 +1,47 @@ +--- +title: "代理中介欺骗 (Agent-Mediated Deception)" +created: 2026-04-19 +updated: 2026-04-19 +type: concept +tags: [alignment, deep-learning, research] +sources: [raw/papers/li-amd-human-perception-2026.md] +--- + +# 代理中介欺骗 (Agent-Mediated Deception, AMD) + +## 定义 + +Agent-Mediated Deception (AMD) 是一种新型攻击面,指被攻破或恶意设计的 LLM Agent 被用作武器,对其人类用户实施欺骗。这与传统的 Agent 自身安全风险不同,关注的是**Agent 作为中介对人类认知的攻击**。 + +## 攻击机制 + +当 Agent 被外部攻击者劫持,或模型内部产生欺骗性行为时,它可能: +- 提供看似合理但错误的建议 +- 隐藏关键安全信息 +- 利用用户的信任进行社会工程学攻击 + +## 人类脆弱性 + +根据 Li et al. (2026) 的实证研究(303 名参与者): +- **仅 8.6%** 的用户能察觉到 AMD 攻击 +- 领域专家在特定场景下**更易受骗**(过度信任自动化工具) +- 识别出 **6 种认知失败模式** +- 风险意识与保护行为之间存在显著鸿沟 + +## 防御策略 + +- **有效警告**:应中断当前工作流,且验证成本低廉 +- **经验学习**:通过 HAT-Lab 等平台的模拟训练,>90% 用户能提高警惕 +- **人机协作设计**:需要重新思考 Agent 输出的人类可验证性 + +## 开放问题 + +- 如何设计 Agent 架构使其行为对人类可审计? +- AMD 攻击的自动化检测方法? +- 如何在保持 Agent 效率的同时降低人类易感性? + +## 相关概念 + +- [[li-amd-human-perception]] — 原始论文 +- [[human-agent-trust]] — 人机信任研究 +- [[alignment]] — AI 对齐与安全 diff --git a/concepts/ai-mathematics.md b/concepts/ai-mathematics.md new file mode 100644 index 0000000..58dcaff --- /dev/null +++ b/concepts/ai-mathematics.md @@ -0,0 +1,66 @@ +--- +title: "AI and Mathematics (AI 与数学)" +created: 2025-04-15 +updated: 2025-04-15 +type: concept +tags: [concept, ai-mathematics, llm, deep-learning, mathematics, research] +sources: [raw/papers/tao-ai-mathematical-methods-2026.md] +--- + +# AI and Mathematics (AI 与数学) + +## 概述 + +AI 与数学的交叉是当代最活跃的研究领域之一。数学被视为探索 AI 能力和限制的"沙盒"(sandbox)。 + +## AI 在数学中的应用 + +### 当前能力 +- 解决越来越复杂的数学问题 +- 生成可独立验证的证明 +- 协助数学家解决深奥的数学猜想 + +### 典型弱点 +[[Terence Tao]] 指出当前 AI 工具展示出**显著且常常荒谬的弱点**: +- 在某些任务上超越人类专家 +- 同时在基础概念上犯**令人据脸的基本错误** + +**Example**: 断言"所有奇数都是质数"——这是一个在人类数学培训早期就会被纠正的错误 + +## 数学作为 "沙盒" + +[[Terence Tao]] 认为数学是探索 AI 影响的理想领域: + +1. **成熟的基础** - 数学有着深厚的历史和严谨的基础 +2. **假设性场景** - 适合探索与现实相反的抽象情境 +3. **客观标准** - 数学证明有明确的对/错标准 +4. **社区反馈** - 数学社区可以快速评估 AI 输出 + +## 对数学研究的影响 + +### 积极方面 +- 自动化繁琐的计算和验证 +- 辅助发现新的数学结果 +- 加速科学研究 + +### 潜在风险 +- **教育问题** - 学生过度依赖 AI,损失培养数学目光和直觉 +- **证明质量** - "无味证明"泛滥:技术正确但缺乏启发性 +- **认知脱节** - 证明能力与推理过程的分离 + +## 未来发展方向 + +根据论文,数学研究可能会: + +1. **劳动分工** - 数学家专门化(使用 AI vs. 提出方向) +2. **方法多样化** - 采用自然科学和人文学科的方法 +3. **重新定义标准** - 在自动验证时代重新定义 "好数学" + +## 关联页面 + +- [[Mathematical methods and human thought in the age of AI]] - 详细阐述 +- [[Terence Tao]] - 该领域的主要思想家 +- [[human-centered-ai]] - 以人类为中心的 AI +- [[formal-verification]] - 形式化验证 +- [[alpha-proof]] - DeepMind 的数学证明 AI +- [[lean-mathlib]] - 大型形式化数学库 diff --git a/concepts/computerized-adaptive-testing.md b/concepts/computerized-adaptive-testing.md new file mode 100644 index 0000000..4564294 --- /dev/null +++ b/concepts/computerized-adaptive-testing.md @@ -0,0 +1,120 @@ +--- +title: Computerized Adaptive Testing (CAT) +created: 2026-04-17 +updated: 2026-04-17 +type: concept +tags: [machine-learning, benchmark] +sources: [raw/papers/zhuang-catsurvey-ml-2024.md] +--- + +# Computerized Adaptive Testing (CAT) + +## Definition +Computerized Adaptive Testing (CAT) 是一种动态测评范式:系统根据考生实时表现,自适应地调整后续题目难度,以最少的题量实现对个体能力的高精度评估。相比传统固定试卷测试,CAT 题量更少、测量精度更高。 + +## 核心组件 + +CAT 系统由四个关键模块组成: + +### 1. Measurement Models (测量模型) +- **传统方法:** Item Response Theory (IRT) — 基于项目反应理论的概率模型,假设题目难度与考生能力之间存在 S 型响应曲线 +- **ML 方法:** 神经网络、深度知识追踪 (Deep Knowledge Tracing)、基于表示学习的测量模型 — 能够捕捉更复杂的题目-能力交互模式 + +### 2. Question Selection Algorithms (选题策略) +- **经典策略:** Maximum Fisher Information (MFI)、Maximum Posterior Weighted Information (MPWI) +- **ML 策略:** 基于强化学习的选题、多臂老虎机 (Multi-armed Bandit)、深度 Q-Network — 在信息增益、暴露率控制、内容平衡之间做多目标优化 + +### 3. Question Bank Construction (题库构建) +- 题目标定 (calibration)、参数估计、题目质量监控 +- ML 方法可用于自动题目生成、难度预测、题目相似度聚类 + +### 4. Test Control (测试控制) +- 终止规则 (stopping criteria):固定长度 vs 精度阈值 +- 内容平衡约束、题目曝光率控制、公平性约束 +- ML 方法:学习型终止规则、约束满足优化 + +## 应用领域 +- **教育测评:** K-12 标准化考试、语言能力测试 (GRE, GMAT) +- **医疗评估:** 症状筛查量表、心理健康测评 +- **体育科学:** 运动员能力分级 +- **社会学研究:** 态度与价值观量表 +- **AI 模型评估:** 自适应 benchmarking,根据模型表现动态调整测试难度(与 [[symbolic-regression]] 等评估场景相关) + +## ML 视角的范式转变 + +传统 CAT 依赖心理测量学和统计学假设(如 IRT 的局部独立性、单维性假设)。随着大规模测试场景复杂度上升,机器学习提供了新的可能性: + +| 维度 | 传统心理测量学 | 机器学习方法 | +|------|--------------|-------------| +| 建模假设 | 强假设(单维性、局部独立) | 弱假设、数据驱动 | +| 可扩展性 | 适合中小规模题库 | 天然支持大规模 | +| 表达能力 | 线性/对数几率 | 非线性、高维交互 | +| 可解释性 | 高(参数有明确意义) | 较低(黑盒风险) | +| 公平性 | 已有成熟 DIF 检测 | 正在发展中 | + +## IRT 数学形式 + +Item Response Theory 是传统 CAT 的核心数学引擎。 + +### 核心符号 +- 考生能力: $\theta \in \mathbb{R}$ +- 题目 $i$ 参数: $\psi_i = (a_i, b_i, c_i)$ +- 作答: $u_i \in \{0, 1\}$ +- ICC (Item Characteristic Curve): $P_i(\theta) = P(u_i = 1 \mid \theta, \psi_i)$ + +### 模型层级 + +**1PL (Rasch Model):** +$$P_i(\theta) = \frac{1}{1 + e^{-(\theta - b_i)}}$$ +仅含难度参数 $b_i$。当 $\theta = b_i$ 时 $P_i = 0.5$。 + +**2PL (CAT 最常用):** +$$P_i(\theta) = \frac{1}{1 + e^{-a_i(\theta - b_i)}}$$ +区分度 $a_i > 0$ 控制曲线斜率。导数: $\frac{dP_i}{d\theta} = a_i P_i(1 - P_i)$,在 $\theta = b_i$ 处达最大值 $a_i / 4$。 + +**3PL (含猜测):** +$$P_i(\theta) = c_i + (1 - c_i) \frac{1}{1 + e^{-a_i(\theta - b_i)}}$$ +猜测概率 $c_i \in [0,1]$。$\theta \to -\infty$ 时 $P_i \to c_i$。 + +### Fisher 信息量与选题 + +题目 $i$ 的 Fisher 信息: +$$I_i(\theta) = \frac{[\partial P_i / \partial \theta]^2}{P_i(1 - P_i)} = a_i^2 P_i(\theta)(1 - P_i(\theta)) \quad (\text{2PL})$$ + +- $\theta = b_i$ 时信息量最大: $I_i = a_i^2 / 4$ +- $\theta \gg b_i$ 或 $\theta \ll b_i$ 时 $I_i \to 0$ + +**CAT 选题:** $i^* = \arg\max_{i} I_i(\hat{\theta}_{\text{当前}})$ + +### 能力估计 + +**对数似然:** +$$\ell(\theta) = \sum_{j=1}^{t} \left[ u_j \ln P_j(\theta) + (1 - u_j) \ln(1 - P_j(\theta)) \right]$$ + +**Newton-Raphson 迭代:** +$$\theta^{(k+1)} = \theta^{(k)} + \frac{\ell'(\theta^{(k)})}{I(\theta^{(k)})}, \quad I(\theta) = \sum_{j=1}^t I_j(\theta)$$ + +**标准误:** $SE(\hat{\theta}) = 1 / \sqrt{I(\hat{\theta})}$ + +### 多维 IRT (MIRT) + +$$P_i(\boldsymbol{\theta}) = \frac{1}{1 + e^{-(\mathbf{a}_i^\top \boldsymbol{\theta} - d_i)}}, \quad \boldsymbol{\theta} \in \mathbb{R}^D$$ + +对应多维自适应测试 (MAT),选题需最大化多维信息矩阵的标量函数(行列式或迹)。 + +## 开放问题与挑战 +1. **公平性与偏差:** 自适应算法可能放大历史数据中的群体偏差 +2. **可解释性:** 深度学习模型的可解释性 vs 心理测量学的透明度 +3. **冷启动问题:** 新题目/新考生的初始参数估计 +4. **安全性:** 题库泄露风险、对抗性攻击 +5. **跨模态测评:** 如何整合文本、图像、交互等多模态数据 +6. **LLM 测评:** 如何用 CAT 范式评估大语言模型能力(自适应 benchmarking) + +## 相关概念 + +- [[cramer-rao-lower-bound]] — CRLB 设定了 CAT 能力估计方差的理论下界,CAT 选题策略本质上是在最大化 Fisher 信息以快速逼近该下界 +- [[symbolic-regression]] — 符号回归中的自适应搜索策略与 CAT 选题策略在"动态探索-利用权衡"上有结构相似性 +- [[knowledge-bank]] — 自适应测评系统需要结构化知识/题库管理,与知识管理系统的设计思想相通 + +## 关键文献 +- Zhuang et al. (2024/2026). *Survey of Computerized Adaptive Testing: A Machine Learning Perspective*. arXiv:2404.00712v4. Accepted by IEEE TPAMI 2026. diff --git a/concepts/cramer-rao-lower-bound.md b/concepts/cramer-rao-lower-bound.md new file mode 100644 index 0000000..1ca3fd9 --- /dev/null +++ b/concepts/cramer-rao-lower-bound.md @@ -0,0 +1,77 @@ +--- +title: Cramér-Rao Lower Bound (CRLB) +created: 2026-04-17 +updated: 2026-04-17 +type: concept +tags: [machine-learning, benchmark] +sources: [raw/papers/hbs-cramerrao-bound-notes.md] +--- + +# Cramér-Rao Lower Bound (CRLB) + +## Definition +The Cramér-Rao Lower Bound (CRLB) states that for **any unbiased estimator** of a population parameter $\theta$, the lowest possible variance is the reciprocal of the Fisher Information $I(\theta)$: +$$\text{Var}(\hat{\theta}) \geq \frac{1}{I(\theta)}$$ + +It represents a fundamental limit in statistical estimation: no matter how clever your estimation method is, you cannot beat this bound. + +## Key Concepts + +### 1. The Score Function +The score $g(\theta; \mathbf{x})$ is the derivative of the log-likelihood with respect to the parameter: +$$g(\theta; \mathbf{x}) = \frac{\partial}{\partial \theta} \log f(\mathbf{x} \mid \theta)$$ +- It measures the "force" the data exerts on the parameter estimate. +- **Crucial property:** $\mathbb{E}[g(\theta; \mathbf{x})] = 0$ (under regularity conditions). + +### 2. Fisher Information +Fisher Information $I(\theta)$ is the variance of the score function: +$$I(\theta) = \text{Var}(g(\theta; \mathbf{x})) = \mathbb{E}\left[ \left( \frac{\partial}{\partial \theta} \log f(\mathbf{x} \mid \theta) \right)^2 \right]$$ + +**Alternative expression (via curvature):** +$$I(\theta) = -\mathbb{E}\left[ \frac{\partial^2}{\partial \theta^2} \log f(\mathbf{x} \mid \theta) \right]$$ +This connects information directly to the curvature of the log-likelihood function. A sharper peak (higher curvature) means higher information and a tighter bound. + +**Properties:** +- $I(\theta)$ is proportional to sample size $n$ ($I_n = n \cdot I_1$). +- Higher variance in the data means lower information per data point. + +### 3. Observed vs. Expected Information +- **Expected Information:** Uses the true parameter and expectation over all possible data. Formula-based. +- **Observed Information:** Uses the actual observed data and the estimated parameter $\hat{\theta}$. Computed from the Hessian of the log-likelihood at $\hat{\theta}$. +- In practice (especially in MLE), standard errors are calculated using the observed information. + +## Classic Examples + +### Normal Distribution (Mean Estimation) +- **Parameter:** $\mu$ +- **Score:** $g(\mu) = \frac{n}{\sigma^2}(\bar{x} - \mu)$ +- **Fisher Information:** $I = \frac{n}{\sigma^2}$ +- **CRLB:** $\frac{\sigma^2}{n}$ +- **Conclusion:** The sample mean $\bar{x}$ is the "best" unbiased estimator, as its variance exactly hits the bound. + +### Binomial Distribution (Proportion Estimation) +- **Parameter:** $\pi$ +- **Score:** $g(\pi) = \frac{k}{\pi} - \frac{n-k}{1-\pi}$ +- **Fisher Information:** $I = \frac{n}{\pi(1-\pi)}$ +- **CRLB:** $\frac{\pi(1-\pi)}{n}$ +- **Conclusion:** The sample proportion $\hat{\pi} = k/n$ is the optimal unbiased estimator. + +## Connection to Maximum Likelihood Estimation (MLE) +- MLE is **consistent** and **asymptotically efficient**. +- As sample size $n \to \infty$, the variance of the MLE approaches the CRLB: $\text{Var}(\hat{\theta}_{\text{MLE}}) \approx 1/I(\theta)$. +- This is why standard errors reported by MLE software are calculated as $1/\sqrt{I_{\text{observed}}}$. + +## Role in Computerized Adaptive Testing (CAT) +In CAT, the CRLB dictates the theoretical limit of measurement precision. +- Each question contributes a certain amount of Fisher Information $I_i(\theta)$. +- The test continues until the accumulated information $I(\theta) = \sum I_i(\theta)$ is large enough that $1/I(\theta)$ (the minimum possible variance) is below a predefined threshold. +- **选题策略 (Item Selection):** Choosing the item with the maximum $I_i(\theta)$ at the current ability estimate $\hat{\theta}$ is equivalent to driving the CRLB down as fast as possible. + +## Multidimensional Extension (Information Matrix) +For a vector of parameters $\boldsymbol{\theta}$, the Fisher Information becomes a matrix $\mathbf{I}(\boldsymbol{\theta})$. The CRLB states that the covariance matrix of any unbiased estimator satisfies: +$$\text{Cov}(\hat{\boldsymbol{\theta}}) \succeq \mathbf{I}(\boldsymbol{\theta})^{-1}$$ +(where $\succeq$ denotes positive semi-definiteness). + +## 相关概念 +- [[computerized-adaptive-testing]] — CAT 的核心目标是最小化能力估计方差,CRLB 提供了理论下界,选题策略本质上是在最大化 Fisher 信息以快速逼近该下界。 +- [[eml-universal-operator]] — EML 树的梯度优化依赖于对参数空间的曲率估计,与 CRLB 中 Fisher 信息作为对数似然曲率的数学本质相通。 diff --git a/concepts/curvine-distributed-cache.md b/concepts/curvine-distributed-cache.md new file mode 100644 index 0000000..9f3127f --- /dev/null +++ b/concepts/curvine-distributed-cache.md @@ -0,0 +1,41 @@ +--- +title: "Curvine 云原生分布式缓存" +created: 2026-04-19 +updated: 2026-04-19 +type: concept +tags: [system-design, performance, tooling] +sources: [raw/articles/oppo-multimodal-data-lake-2026.md] +--- + +# Curvine 云原生分布式缓存 + +**开发者:** OPPO (已开源) · GitHub: https://github.com/curvineio/curvine + +## 定义 +Curvine 是 OPPO 自研并开源的云原生高性能分布式缓存文件系统,专为解决云上对象存储 IO 性能瓶颈而设计。 + +## 解决的问题 +1. **OSS 带宽配额瓶颈**:云厂商默认读带宽限制在大数据场景下易成瓶颈 +2. **专线带宽压力**:混合云架构下,重复读取易打爆专线,影响其他业务 +3. **计算节点磁盘闲置**:节点配置的云盘(如 2.5TB)主要用于 Shuffle,利用率常低于 20% + +## 核心特性 +- **双模式支持**: + - 缓存模式:读写与 OSS 保持一致 + - FS 模式:Curvine 管理元数据,支持完整 POSIX 语义,对象存储数据可作本地盘访问 +- **协议兼容**:支持 S3、HDFS 协议,原生支持 Kubernetes CSI 模式 +- **任务调度**:常驻服务,处理数据加载和大文件操作 + +## 应用场景与性能 +- **LanceDB 向量查询加速**:社区版 LanceDB + Curvine 性能 ≈ LanceDB 商业版 +- **索引与元数据缓存**:支持预热模式,高性能访问 LanceDB 索引和 Manifest +- **热表数据加速**:重复读取数据从 OSS 加载至本地缓存盘 +- **Checkpoint 写入加速**:高频模型训练写入提供高性能支持 + +## 未来规划 +- 扩展为数据转换服务层:自动转 Lance 格式、自动构建索引、小文件自动合并 + +## 相关概念 + +- [[oppo-multimodal-data-lake]] — OPPO 数据湖实践 +- [[gravitino-unified-metadata]] — 元数据管理配套 diff --git a/concepts/depth-scaling-signal-degradation.md b/concepts/depth-scaling-signal-degradation.md new file mode 100644 index 0000000..d8be2ba --- /dev/null +++ b/concepts/depth-scaling-signal-degradation.md @@ -0,0 +1,37 @@ +--- +title: "LLM 深度扩展与信号退化" +created: 2026-04-19 +updated: 2026-04-19 +type: concept +tags: [architecture, deep-learning, transformer] +sources: [raw/papers/zhu-moda-mixture-of-depths-2026.md] +--- + +# LLM 深度扩展与信号退化 (Depth Scaling & Signal Degradation) + +## 背景 + +增加模型深度是提升 LLM 性能的关键途径之一。然而,深度扩展面临**信号退化**问题:随着层数增加,浅层提取的信息特征在多次残差更新中被稀释,导致深层难以有效利用这些特征。 + +## 信号退化机制 + +在标准 Transformer 的残差流(Residual Stream)中: +$$x_{l+1} = x_l + f_l(x_l)$$ +其中 $f_l$ 是第 $l$ 层的变换(注意力 + FFN)。随着 $l$ 增加,$x_0$ 的原始信息被多次叠加的 $f_k$ 覆盖,导致"遗忘"。 + +## 缓解策略 + +### 架构级 +- **MoDA (Mixture-of-Depths Attention)**:注意力头直接跨层访问前序 KV [[mixture-of-depths-attention]] +- **残差连接变体**:如 Pre-Norm vs Post-Norm,影响梯度流动 +- **层归一化位置**:Post-Norm 在 MoDA 中表现更好 + +### 训练级 +- **深度初始化**:特殊初始化策略保持信号幅度 +- **梯度裁剪与缩放**:防止深层梯度爆炸/消失 + +## 相关概念 + +- [[mixture-of-depths-attention]] — MoDA 机制 +- [[zhu-moda-mixture-of-depths]] — MoDA 论文 +- [[transformer-architecture]] — Transformer 基础架构 diff --git a/concepts/eml-operator.md b/concepts/eml-operator.md new file mode 100644 index 0000000..1dabda7 --- /dev/null +++ b/concepts/eml-operator.md @@ -0,0 +1,128 @@ +--- +title: "EML 算子 (Exp-Minus-Log)" +created: 2026-04-16 +updated: 2026-04-16 +type: concept +tags: [algorithm, concept, research] +sources: [raw/papers/odrzywolek-eml-single-operator-2026.md] +--- + +# EML 算子 (Exp-Minus-Log) + +## 定义 + +EML (Exp-Minus-Log) 是一个二元算子,定义为: + +$$\text{eml}(x,y) = \exp(x) - \ln(y)$$ + +该算子配合常数 $1$,构成了连续数学中的 **Sheffer 算子**——单一算子足以生成所有初等函数。 + +## 核心性质 + +### 完备性 +- 与数字电路中的 NAND 门类似,EML 对初等函数具有完备性 +- 两按钮计算器 $(1, \text{eml})$ 可替代 36 按钮科学计算器 +- 可生成:所有算术运算、超越函数、数学常数 ($e,\pi,i$) + +### 二叉树结构 +每个 EML 表达式是同质节点的二叉树: + +$$S \to 1 \mid \text{eml}(S,S)$$ + +这种结构与满二叉树和 Catalan 数同构,提供了规则的搜索空间。 + +### 复数中间值 +- EML 计算需要在复数域内进行(至少内部如此) +- 类似于量子计算使用复振幅计算实概率 +- 生成 $i$ 和 $\pi$ 需要计算 $\ln(-1)$ + +## 基本构造示例 + +| 目标 | EML 表达式 | 深度 | +|------|-----------|------| +| $e$ | $\text{eml}(1,1)$ | 1 | +| $e^x$ | $\text{eml}(x,1)$ | 1 | +| $\ln(x)$ | $\text{eml}(1,\text{eml}(\text{eml}(1,x),1))$ | 3 | +| $0$ | $\text{eml}(\text{eml}(1,1),\text{eml}(1,1))$ | 3 | +| $-1$ | 复杂组合 | 15-17 | +| $x+y$ | 复杂组合 | 19-27 | +| $x\times y$ | 复杂组合 | 17-41 | + +## 变体算子 + +$$\begin{align} +\text{eml}(x,y) &= \exp(x) - \ln(y) & \text{需常量 } 1 \\ +\text{edl}(x,y) &= \exp(x) / \ln(y) & \text{需常量 } e \\ +-\text{eml}(y,x) &= \ln(x) - \exp(y) & \text{需常量 } -\infty +\end{align}$$ + +## 约化历程 + +从 36 个原始操作到 EML 的逐步约化: + +1. **Base-36** — 标准科学计算器 (36 个原始操作) +2. **Calc 3** — 保留 $\exp,\ln,-x,1/x,+$ (6 个) +3. **Calc 2** — 保留 $\exp,\ln,-$ (4 个) +4. **Calc 1** — 使用 $x^y,\log_x y$ 和常量 $e$ 或 $\pi$ (4 个) +5. **Calc 0** — 使用 $\exp$ 和 $\log_x y$ (3 个) +6. **EML** — 单一二元算子 + 常量 1 (2 个) + +## 应用场景 + +### 符号回归 +EML 树可作为"主公式"架构: +- 构造固定深度的完整二叉树 +- 每个输入是 $1$、变量 $x$ 或子树结果的线性组合 +- 使用梯度优化(Adam)训练参数 +- 训练后将权重"吸附"到 0/1 精确值 + +### 模拟电路 +EML 可作为模拟计算的基本构建块,类似于运算放大器。 + +### 形式化验证 +- 在 Mathematica 和 IEEE754 浮点中工作良好 +- 在 Lean 4 中遇到挑战(因 $\ln(0)=0$ 的"垃圾值"定义) +- 需要处理扩展实数 ($\pm\infty$) 和复数分支切割 + +## 与符号回归的联系 + +EML 树表示使得 [[symbolic-regression]] 可通过梯度下降而非组合搜索实现: + +1. **可训练电路**:EML 树成为可微分计算图 +2. **标准优化器**:Adam 等梯度方法可优化树参数 +3. **精确恢复**:在浅层深度(≤4)时,该方法可从数值数据恢复闭式初等函数 +4. **损失地形**:统一结构相比异构表达式树可能提供更优的优化地形 + +## 与布尔逻辑的类比 + +| 方面 | 布尔逻辑 | 连续数学 | +|------|----------|----------| +| 通用原语 | NAND/NOR 门 | **EML 算子** | +| 元数 | 2 输入 | 2 输入 | +| 完备性 | 所有布尔函数 | 所有初等函数 | +| 结构 | 统一门网络 | 统一 EML 树 | +| 搜索空间 | 离散 | 连续(可微) | + +## 研究意义 + +1. **神经-符号集成**:桥接神经网络(可微)与符号数学 +2. **发现方法**:通过系统穷举搜索发现——暗示可能存在其他通用原语 +3. **科学发现**:有潜力从数据中自动发现物理定律 +4. **教育意义**:暗示微积分/分析教学的极简基础 + +## 开放问题 + +1. **无常量 Sheffer 算子** — 是否存在不需要区分常量的二元算子? +2. **一元 Sheffer 算子** — 是否存在同时作为激活函数和初等函数生成器的一元算子? +3. **更好性质的变体** — 是否存在非指数渐近、无定义域问题的类似算子? +4. **连续族** — EML 是否属于一个更大的连续算子族? +5. **最小深度** — 特定函数所需的最小 EML 树深度是多少? +6. **多维推广** — 该方法能否扩展到多元函数和偏微分方程? +7. **泛化影响** — EML 表示如何影响学习模型的泛化能力? + +## 相关页面 + +- [[odrzywolek-eml-single-operator]] — EML 算子论文 +- [[symbolic-regression]] — 应用领域 +- [[computerized-adaptive-testing]] — CRLB 相关应用 +- [[cramer-rao-lower-bound]] — Fisher 信息与参数估计 diff --git a/concepts/formal-verification.md b/concepts/formal-verification.md new file mode 100644 index 0000000..ea95917 --- /dev/null +++ b/concepts/formal-verification.md @@ -0,0 +1,54 @@ +--- +title: "Formal Verification (形式化验证)" +created: 2025-04-15 +updated: 2025-04-15 +type: concept +tags: [concept, mathematics, logic, ai-mathematics, verification] +sources: [raw/papers/tao-ai-mathematical-methods-2026.md] +--- + +# Formal Verification (形式化验证) + +## 定义 + +**Formal Verification** 是使用形式化方法(如一阶逻辑、集合论)来验证数学证明或计算机程序正确性的过程。 + +## 历史背景 + +数学传统上有客观的证明标准: +- 从欧几里得到二十世纪初的基础 +- 尽管如此,人类数学家的论证通常不达到完美严格的理想 +- 错误是常见的,有些被修正,有些成为 "folklore" + +## 形式化验证的局限 + +[[Terence Tao]] 在其论文中指出了形式化验证的两个关键局限: + +### 1. 翻译问题 +Formal verification only certifies that a formalized argument establishes a formal mathematical statement, but does not rule out errors in translation between the formal statement and the original intended statement. + +**Example** (陶哲轩的费马大定理例子): +- 费马大定理断言:对于 $n > 2$,方程 $a^n + b^n = c^n$ 没有自然数解 +- 隐含假设:自然数从 1 开始,而非 0 +- 如果 AI 错误地允许 $a, b, c$ 为 0,可能"证明"费马大定理是错误的! + +### 2. 无法捕捉 "Penumbra" +即使形式化验证可以确保推理的正确性,它无法捕捉: +- **Heuristics** 启发式 - 为什么这个方法有效 +- **Motivation** 动机 - 为什么要研究这个问题 +- **Context** 背景 - 如何广泛地理解这个结果 +- **Narrative** 叙事 - 证明的策略和构思 + +## AI 时代的意义 + +[[Terence Tao]] 认为: +- AI 可以自动化形式化证明的生成 +- 但这可能产生 "odorless proofs"(无味证明):技术上正确,但缺乏启发性 +- 人类数学家需要专注于那些不容易自动验证的方面 + +## 关联页面 + +- [[Mathematical methods and human thought in the age of AI]] - 详细讨论 +- [[Terence Tao]] - 该概念的主要阐述者 +- [[lean-mathlib]] - 论文提及的大型形式化数学库 +- [[smell-test]] - "气味测试"概念 diff --git a/concepts/gravitino-unified-metadata.md b/concepts/gravitino-unified-metadata.md new file mode 100644 index 0000000..fa5dc6d --- /dev/null +++ b/concepts/gravitino-unified-metadata.md @@ -0,0 +1,35 @@ +--- +title: "Gravitino 统一元数据管理" +created: 2026-04-19 +updated: 2026-04-19 +type: concept +tags: [system-design, tooling] +sources: [raw/articles/oppo-multimodal-data-lake-2026.md] +--- + +# Gravitino 统一元数据管理 + +**应用案例:** OPPO 多模态数据湖 (2026) + +## 背景 +在构建多模态数据湖初期,OPPO 面临算法数据散落在数百 PB 的 PB 级脚本中,缺乏归属人、使用情况和依赖关系的管理,导致严重的元数据混乱和数据滥用问题。 + +## 核心能力 + +1. **统一 Catalog**:支持多引擎友好,实现 Hive 表与 Lance 表在同一套目录下的统一管理 +2. **多云分布支持**:适配混合云模式(自建机房 + 阿里云),数据分布对业务无感,简化表与数据迁移 +3. **数据资产全局可感知**:实现目录归属人、每日账单、上下游依赖关系的精准归因,数据治理清晰可控 + +## 落地策略 +- **收口机制**:强制所有新增目录必须通过 Gravitino 访问,否则拒绝 +- **存量转换**:通过控制增量、逐步转换存量的方式,最终将所有元数据收归统一平台 + +## 收益 +- 用户侧:一次查询、少搬数据、权限统一 +- 架构侧:元数据集中、易扩展、易治理 +- 支持联邦查询:单条 SQL 跨 Hive/Lance 表 JOIN + +## 相关概念 + +- [[oppo-multimodal-data-lake]] — OPPO 数据湖实践 +- [[curvine-distributed-cache]] — 配套加速层 Curvine diff --git a/concepts/human-agent-trust.md b/concepts/human-agent-trust.md new file mode 100644 index 0000000..5ad1f52 --- /dev/null +++ b/concepts/human-agent-trust.md @@ -0,0 +1,38 @@ +--- +title: "人机信任 (Human-Agent Trust)" +created: 2026-04-19 +updated: 2026-04-19 +type: concept +tags: [alignment, research] +sources: [raw/papers/li-amd-human-perception-2026.md] +--- + +# 人机信任 (Human-Agent Trust) + +## 背景 + +随着 LLM Agent 在软件开发、医疗等高风险领域成为受信任的副驾驶(copilots),人机信任问题从理论走向实践。信任的建立与滥用构成了新的安全挑战。 + +## 核心矛盾 + +- **信任的必要性**:Agent 需要一定的用户信任才能有效协作 +- **信任的脆弱性**:过度信任导致用户对 Agent 输出缺乏批判性验证 +- **领域专家悖论**:专家在自身领域可能更倾向于信任工具的输出,反而在特定场景下更易受 AMD 攻击 + +## 研究进展 + +- **HAT-Lab** (Li et al., 2026):首个高保真人机信任实验平台,涵盖 9 个真实场景 +- **认知失败模式**:识别了 6 种用户在面对欺骗性 Agent 时的认知失效路径 +- **经验学习**:通过模拟体验,用户可显著提高对 AMD 的警惕性(>90%) + +## 防御设计原则 + +1. **可验证性**:Agent 的输出应易于人类交叉验证 +2. **低成本警告**:安全警告应中断工作流但验证成本低 +3. **信任校准**:帮助用户建立对 Agent 能力的准确预期,避免过度或不足信任 + +## 相关概念 + +- [[agent-mediated-deception]] — AMD 攻击与防御 +- [[human-centered-ai]] — 以人为中心的 AI 哲学 +- [[li-amd-human-perception]] — 实证研究论文 diff --git a/concepts/human-centered-ai.md b/concepts/human-centered-ai.md new file mode 100644 index 0000000..70474c3 --- /dev/null +++ b/concepts/human-centered-ai.md @@ -0,0 +1,43 @@ +--- +title: "Human-Centered AI (以人类为中心的 AI)" +created: 2025-04-15 +updated: 2025-04-15 +type: concept +tags: [concept, ai-philosophy, alignment, llm, deep-learning] +sources: [raw/papers/tao-ai-mathematical-methods-2026.md] +--- + +# Human-Centered AI (以人类为中心的 AI) + +## 定义 + +**Human-Centered AI (HCAI)** 是一种 AI 发展和应用的哲学框架,强调 AI 工具应当设计和使用以增强人类能力、满足人类需求和提升人类生活质量为核心目标。 + +**核心原则**(来自 [[Terence Tao]] 和 [[Tanya Klowden]]): +1. AI 是人类历史上为促进思想的创造、组织和传播而发展的工具的自然演进 +2. 必须确保 AI 的发展和应用保持**根本上以人类为中心** +3. 创新应以满足人类需求为导向 +4. 增进人类思维和理解能力 + +## 与其他 AI 哲学的区别 + +| 方向 | 焦点 | 以人类为中心 | +|-------|------|------------| +| 技术決定论 | 技术自身的发展 | 技术为人类服务 | +| 效率优先 | 自动化和取代人类 | 增强人类能力 | +| 工具主义 | AI 作为独立实体 | AI 作为人类工具 | + +## 在数学中的应用 + +在 [[Mathematical methods and human thought in the age of AI]] 中,陶哲轩提出: + +- AI 可以处理费力的计算,但人类数学家应专注于启发式、创造性的工作 +- "Smell Test"(气味测试):好的数学不仅要正确,还要有启发性 +- 不能让 AI 的 "odorless proofs"(无味证明)取代人类的理解和洞察 + +## 关联页面 + +- [[Mathematical methods and human thought in the age of AI]] - 详细阐述以人类为中心 AI 的论文 +- [[Terence Tao]] - 该概念的主要倡导者之一 +- [[alignment]] - AI 对齐/安全 +- [[ai-philosophy]] - AI 哲学 diff --git a/concepts/knowledge-bank.md b/concepts/knowledge-bank.md new file mode 100644 index 0000000..179f26f --- /dev/null +++ b/concepts/knowledge-bank.md @@ -0,0 +1,96 @@ +--- +title: Knowledge Bank — AI 辅助开发时代的知识管理系统 +created: 2026-04-16 +updated: 2026-04-17 +type: concept +tags: [knowledge-management, open-source, multi-agent] +sources: [raw/articles/knowledge-bank-ai-dev-2026.md] +--- + +# Knowledge Bank + +面向 AI 辅助开发时代的知识管理系统,通过自动捕获、结构化存储和智能检索,让开发团队的知识真正流动起来。 + +项目仓库: [gabrywu-public/knowledge-bank](https://github.com/gabrywu-public/knowledge-bank) + +## 核心洞察 + +### 转变一:知识受众从"人"变为"机器" + +传统知识管理假设知识是给人阅读的(精美文档、结构化 wiki、详细注释),但现实中开发者不会主动看文档,即使看了也记不住、找不到、或已过时。 + +在 AI 辅助开发时代,**真正的知识消费者是 AI 代码助手**(Claude Code、Cursor、GitHub Copilot)。知识需要结构化、情境化、可检索的格式,让 AI 能快速理解和应用。 + +### 转变二:三维知识分类体系 + +不再按主题分类,而是采用 **作用域 + 来源 + 类型** 的三维分类: + +| 维度 | 分类 | 说明 | +|------|------|------| +| **作用域 (Scope)** | 个人 / 项目 / 组织 | 知识的共享边界,避免知识冲突,实现精准注入 | +| **来源 (Source)** | AI 观察 > 架构师决策 > Reviewer 偏好 > 开发者经验 | 知识的权威性权重;AI 观察因来自实际代码、可验证、实时性而权重最高 | +| **类型 (Type)** | 代码模式 / 架构决策 / 配置偏好 / 陷阱警示 / API 用法 | 知识的应用方式 | + +**关键设计:AI 观察的可信度最高** —— 这违反直觉但合理,因为 AI 观察直接来自实际代码(可追溯到 commit),反映当前真实状态,而非人为偏好或可能过时的文档。 + +### 转变三:知识生命周期重构 + +从 **"写作→发布→被遗忘→过时→删除"** 转变为 **"捕获→检索→应用→收集"**: + +- **零摩擦捕获**: 不需要开发者专门写文档,知识在开发过程中自动提取 +- **情境化检索**: 不是被动等待查询,而是主动在需要时注入相关知识 +- **智能去重**: 通过多维度相似度评分(标题 40% + 摘要 30% + 内容 20% + 上下文 10%)自动合并 +- **持续进化**: 知识库随项目发展自动更新和优化 + +## 技术架构 + +### Fork Context(上下文隔离架构) + +知识操作(检测、去重、评分)在分叉的隔离环境中执行,不干扰主会话: + +1. **会话开始 → 知识注入**: 提取关键词 → 搜索知识 → 相关性评分 → 过滤 → 格式化注入 +2. **会话结束 → 知识收集**: 分析会话记录 → 识别有价值知识点 → 4 项资格检查 → 去重 → 创建/更新知识 + +优势:主会话保持简洁,复杂分析不干扰用户体验,可并行执行。 + +### 强制仓库关联 (Repository-Aware) + +所有知识和会话必须关联到 Git 仓库(`repository_id NOT NULL`),确保数据完整性和精准检索。 + +### 完整会话追踪 + +记录每次开发会话的完整上下文:session_id、仓库、分支、commit、工具使用、文件修改等。 + +## 知识生命周期七阶段 + +Knowledge Bank 将知识管理融入软件开发全流程,形成"生长的枝干": + +1. **需求分析**: 自动检索历史需求知识,注入相关业务规则 +2. **架构设计**: 自动注入项目架构规范,收集新的设计决策 +3. **编码开发**: 自动注入编码规范,识别新的代码模式 +4. **测试验证**: 自动注入已知陷阱,收集新的 edge case +5. **Code Review**: AI 辅助审查,更新 Review 规则 +6. **部署运维**: 基于历史故障经验自动诊断,收集运维知识 +7. **迭代优化**: 追溯完整知识链路,指导优化决策 + +## 与传统知识管理的对比 + +| 维度 | 传统方式 | Knowledge Bank | +|------|----------|----------------| +| 受众 | 人 | AI(+ 人) | +| 载体 | 静态文档 | 动态上下文 | +| 获取方式 | 主动查询 | 自动注入 | +| 维护方式 | 人工编写 | 自动捕获 | +| 知识形态 | 散落的金子(孤立、过时) | 生长的枝干(互联、进化) | + +## 相关概念 + +- **多 Agent 工作流**: Knowledge Bank 的多阶段知识采集机制本质上是一种 agent 工作流 +- **持久化知识编译**: 与 Karpathy 的 LLM Wiki 模式形成互补——Knowledge Bank 侧重 AI 辅助开发场景的自动化知识捕获,llm-wiki 侧重持久化知识编译 +- [[computerized-adaptive-testing]] — CAT 的自适应选题本质上是知识注入的精准化:在正确的时间向正确的对象注入正确的测试项,与 Knowledge Bank 的情境化检索有相同的设计哲学 + +## 开放问题 + +- Knowledge Bank 的三维分类体系是否可扩展到非代码领域(如科研、写作)? +- AI 观察的"最高可信度"假设在代码存在 anti-pattern 时是否仍然成立? +- 知识去重的相似度阈值(0.85 合并 / 0.60 提示)是否经过实证验证? diff --git a/concepts/kvcache-transfer.md b/concepts/kvcache-transfer.md new file mode 100644 index 0000000..b103cf9 --- /dev/null +++ b/concepts/kvcache-transfer.md @@ -0,0 +1,38 @@ +--- +title: "KVCache 传输与优化" +created: 2026-04-19 +updated: 2026-04-19 +type: concept +tags: [inference, system-design, performance] +sources: [raw/papers/qin-prfaas-cross-datacenter-2026.md] +--- + +# KVCache 传输与优化 (KVCache Transfer) + +## 定义 + +KVCache 是 LLM 推理过程中缓存的 Key-Value 状态,用于避免重复计算。KVCache 传输指在分离式推理架构中将 prefill 阶段生成的 KVCache 移动到 decode 节点的过程。 + +## 传输瓶颈 + +- **体积巨大**:Dense-attention 模型的 KVCache 大小与序列长度和模型参数量成正比 +- **带宽要求**:传统架构依赖 RDMA 等低延迟高带宽网络 +- **延迟敏感**:传输延迟直接影响 TTFT(Time to First Token) + +## 优化方向 + +### 模型侧 +- **混合注意力架构**:通过结构化状态空间或线性注意力减少 KVCache 大小 +- **KVCache 压缩**:量化、稀疏化或蒸馏技术 +- **前缀缓存共享**:多请求共享公共前缀的 KVCache + +### 系统侧 +- **选择性传输**:仅传输必要的 KVCache 层或 token +- **带宽感知调度**:根据网络状态动态调整传输策略 +- **PrfaaS 架构**:结合模型效率与系统调度,实现跨数据中心传输 + +## 相关概念 + +- [[prefill-as-a-service]] — PrfaaS 架构中的 KVCache 传输 +- [[prefill-decode-disaggregation]] — PD 分离架构 +- [[inference-optimization]] — 推理优化技术 diff --git a/concepts/memory-caching-rnn.md b/concepts/memory-caching-rnn.md new file mode 100644 index 0000000..c5477cd --- /dev/null +++ b/concepts/memory-caching-rnn.md @@ -0,0 +1,54 @@ +--- +title: "Memory Caching (MC)" +created: 2026-04-19 +updated: 2026-04-19 +type: concept +tags: [architecture, deep-learning, llm] +sources: [raw/papers/behrouz-memory-caching-rnn-2026.md] +--- + +# Memory Caching (MC) + +**提出者:** Behrouz et al. (2026) · arXiv:2602.24281 + +## 定义 + +Memory Caching 是一种增强循环神经网络(RNN)的技术,通过缓存其隐藏状态的检查点(checkpoints),使 RNN 的有效记忆容量能够随序列长度动态增长。 + +## 动机 + +Transformer 成为序列建模范式的主要原因是其**记忆容量随上下文长度增长**的特性,这使得检索任务表现优异。然而,这也带来了 $O(L^2)$ 的二次复杂度。近年来研究者探索了次二次复杂度的 RNN 替代方案,但 RNN 在回忆密集型任务中表现不佳,通常归因于其**固定大小的记忆**限制。 + +## 技术原理 + +MC 的核心思想:在 RNN 前向传播过程中,定期保存隐藏状态的快照。当需要回忆历史信息时,可以从这些缓存的检查点恢复,而不是仅依赖当前隐藏状态。 + +### 四种变体 + +1. **基础 MC** — 均匀间隔缓存 +2. **门控聚合 MC** — 使用门控机制选择性地缓存重要状态 +3. **稀疏选择 MC** — 稀疏化缓存策略 +4. **深层 MC** — 应用于深层记忆模块 + +### 复杂度插值 + +MC 提供了一个可调节的超参数,控制缓存频率,从而在 $O(L)$(传统 RNN)和 $O(L^2)$(Transformer)之间实现灵活插值: +- 缓存频率 = 0 → 等价于标准 RNN +- 缓存频率 = 1 → 每步都缓存,接近 Transformer 的记忆能力 + +## 实验结果 + +- **语言建模**:MC 提升 RNN 性能 +- **长上下文理解**:MC 变体表现接近 Transformer +- **上下文回忆任务**:优于 SOTA RNN,接近 Transformer + +## 开放问题 + +- 缓存检查点的最优策略是什么? +- MC 与其他次二次架构(Mamba、RWKV)的结合效果如何? +- 在实际部署中,缓存带来的内存开销与性能增益的最佳平衡点在哪里? + +## 相关概念 + +- [[behrouz-memory-caching-rnn]] — 原始论文笔记 +- [[subquadratic-transformer-alternatives]] — 次二次 Transformer 替代方案 diff --git a/concepts/mixture-of-depths-attention.md b/concepts/mixture-of-depths-attention.md new file mode 100644 index 0000000..f908148 --- /dev/null +++ b/concepts/mixture-of-depths-attention.md @@ -0,0 +1,59 @@ +--- +title: "Mixture-of-Depths Attention (MoDA)" +created: 2026-04-19 +updated: 2026-04-19 +type: concept +tags: [architecture, deep-learning, transformer] +sources: [raw/papers/zhu-moda-mixture-of-depths-2026.md] +--- + +# Mixture-of-Depths Attention (MoDA) + +**提出者:** Zhu et al. (2026) · arXiv:2603.15619 + +## 定义 + +MoDA 是一种改进的注意力机制,旨在解决深层 Transformer 模型中的**信号退化**问题。它允许每个注意力头在计算注意力时,不仅关注当前层的序列 KV,还能直接访问前序若干层的深度 KV,形成跨层的信息通路。 + +## 动机:信号退化 (Signal Degradation) + +在标准 Transformer 中,信息通过残差连接逐层传递。随着网络深度增加: +- 浅层提取的精细特征在多次残差更新中被逐渐"稀释" +- 深层网络难以有效利用浅层形成的关键信息 +- 简单的残差连接不足以保留所有重要特征 + +## 机制设计 + +### 核心思想 +每个注意力头的查询 $Q$ 不仅与当前层的 $K, V$ 计算注意力,还与前序 $D$ 层的 $K, V$ 计算注意力: +$$\text{MoDA}(Q_l) = \text{Softmax}\left(\frac{Q_l [K_{l-D:l}]^T}{\sqrt{d}}\right) V_{l-D:l}$$ + +### 硬件高效实现 +- **挑战**:跨层 KV 访问导致非连续内存访问,降低 GPU 利用率 +- **解决方案**:设计专门的内存访问算法,重组 KV 缓存布局 +- **性能**:在 64K 序列长度下达到 FlashAttention-2 的 97.3% 效率 + +## 实验表现 + +| 指标 | 基线 | MoDA | 提升 | +|------|------|------|------| +| 平均困惑度 (10 benchmarks) | - | -0.2 | ✓ | +| 下游任务性能 (10 tasks) | - | +2.11% | ✓ | +| FLOPs 开销 | 1.0x | 1.037x | +3.7% | + +## 归一化位置 + +- **Post-Norm** + MoDA > **Pre-Norm** + MoDA +- 这与标准 Transformer 的常见实践(Pre-Norm 更稳定)不同,表明 MoDA 改变了梯度流动特性 + +## 开放问题 + +- MoDA 与混合注意力架构的结合效果? +- 在超大规模模型(>100B)上的扩展性如何? +- 是否可以与 [[memory-caching-rnn]] 等技术结合? + +## 相关概念 + +- [[zhu-moda-mixture-of-depths]] — 原始论文 +- [[depth-scaling-llms]] — LLM 深度扩展 +- [[signal-degradation]] — 信号退化问题 diff --git a/concepts/prefill-as-a-service.md b/concepts/prefill-as-a-service.md new file mode 100644 index 0000000..51d15a8 --- /dev/null +++ b/concepts/prefill-as-a-service.md @@ -0,0 +1,59 @@ +--- +title: "Prefill-as-a-Service (PrfaaS)" +created: 2026-04-19 +updated: 2026-04-19 +type: concept +tags: [inference, system-design, architecture] +sources: [raw/papers/qin-prfaas-cross-datacenter-2026.md] +--- + +# Prefill-as-a-Service (PrfaaS) + +**提出者:** Qin et al. (2026) · arXiv:2604.15039 + +## 定义 + +PrfaaS 是一种跨数据中心的 LLM 服务架构,通过选择性地将长上下文 prefill 卸载到独立的计算密集型集群,并通过商用以太网将 KVCache 传输到本地 decode 集群,实现 prefill 和 decode 容量的独立扩展。 + +## 动机 + +传统的 [[prefill-decode-disaggregation]] 架构虽然分离了计算密集型的 prefill 和内存密集型的 decode 阶段,但受限于 KVCache 的传输成本: +- **Dense-attention 模型**:KVCache 体积巨大,需要低延迟 RDMA 网络 +- **混合注意力模型**:KVCache 大幅减小,但真实负载特性(突发、长度偏斜、带宽波动)仍使简单的外部化设计面临拥塞和低利用率问题 + +## 架构设计 + +### 核心组件 +1. **独立 Prefill 集群**:计算密集型,专门处理长上下文 prefill +2. **本地 PD 集群**:接收 KVCache 后执行 decode +3. **带宽感知调度器**:根据跨数据中心带宽波动动态调整卸载策略 +4. **缓存感知请求放置**:利用现有前缀缓存优化请求路由 + +### 关键技术 +- **选择性卸载**:仅对长上下文请求进行跨数据中心 prefill 卸载 +- **KVCache 高效传输**:通过商用以太网(无需 RDMA)传输 +- **系统侧与模型侧协同**:结合模型 KV 效率优化与系统调度 + +## 性能表现 + +基于内部 1T 参数混合模型: +- 吞吐量比同构 PD 部署高 **54%** +- 吞吐量比朴素异构基线高 **32%** +- 跨数据中心带宽消耗适度 + +## 意义 + +PrfaaS 解除了"异构加速器必须共享同一低延迟 RDMA fabric"的限制,使得 LLM 服务可以更灵活地部署在松散耦合的集群中,为云原生 LLM 服务提供了新的架构范式。 + +## 开放问题 + +- 如何自适应选择预填卸载的阈值? +- PrfaaS 在多租户环境下的隔离与调度策略? +- 对纯 dense-attention 模型的适用性边界? + +## 相关概念 + +- [[qin-prfaas-cross-datacenter]] — 原始论文 +- [[prefill-decode-disaggregation]] — PD 分离架构 +- [[kvcache-transfer]] — KVCache 传输优化 +- [[hybrid-attention-models]] — 混合注意力架构 diff --git a/concepts/prefill-decode-disaggregation.md b/concepts/prefill-decode-disaggregation.md new file mode 100644 index 0000000..06899e7 --- /dev/null +++ b/concepts/prefill-decode-disaggregation.md @@ -0,0 +1,38 @@ +--- +title: "Prefill-Decode 分离架构 (PD Disaggregation)" +created: 2026-04-19 +updated: 2026-04-19 +type: concept +tags: [inference, system-design, architecture] +sources: [raw/papers/qin-prfaas-cross-datacenter-2026.md] +--- + +# Prefill-Decode 分离架构 (PD Disaggregation) + +## 定义 + +将 LLM 推理的两个主要阶段——**Prefill**(处理 prompt,计算密集型)和 **Decode**(自回归生成 token,内存密集型)——分离到不同的硬件或集群上执行,以优化资源利用率。 + +## 演进背景 + +1. **同构部署**:Prefill 和 Decode 在同一 GPU 上执行,资源利用率低 +2. **PD 分离**:将两者分离,分别优化计算和内存资源 +3. **跨数据中心 PD 分离**:PrfaaS 架构进一步打破网络域限制,实现跨数据中心的资源弹性 + +## 核心挑战 + +- **KVCache 传输成本**:Dense-attention 模型产生巨大的 KVCache,需要高带宽低延迟网络(RDMA) +- **负载不均衡**:Prefill 和 Decode 的峰值时间不同,但传统架构受限于网络拓扑 +- **异构部署困难**:不同代际或类型的加速器难以在同一网络域内协同 + +## 最新进展 + +- **混合注意力架构**(如 Hyena、基于状态空间的模型)大幅减少 KVCache 大小 +- **PrfaaS** (Qin et al., 2026):结合模型侧 KV 效率与系统侧选择性卸载,实现跨数据中心 PD 分离 +- **商用以太网替代 RDMA**:降低部署成本和复杂性 + +## 相关概念 + +- [[prefill-as-a-service]] — PrfaaS 架构 +- [[kvcache-transfer]] — KVCache 传输优化 +- [[hybrid-attention-models]] — 混合注意力架构 diff --git a/concepts/subquadratic-transformer-alternatives.md b/concepts/subquadratic-transformer-alternatives.md new file mode 100644 index 0000000..938cca3 --- /dev/null +++ b/concepts/subquadratic-transformer-alternatives.md @@ -0,0 +1,49 @@ +--- +title: "次二次 Transformer 替代方案" +created: 2026-04-19 +updated: 2026-04-19 +type: concept +tags: [architecture, deep-learning, llm] +sources: [raw/papers/behrouz-memory-caching-rnn-2026.md] +--- + +# 次二次 Transformer 替代方案 (Subquadratic Transformer Alternatives) + +## 问题定义 + +Transformer 的核心瓶颈在于自注意力机制的 $O(L^2)$ 计算和内存复杂度,限制了其在长序列上的应用。近年来涌现了多种次二次复杂度的替代架构。 + +## 主要方向 + +### RNN 类 +- **传统 RNN/LSTM/GRU** — $O(L)$ 复杂度,但固定记忆限制回忆能力 +- **Memory Caching (MC)** — 通过缓存检查点扩展 RNN 记忆 [[memory-caching-rnn]] +- **Mamba/State Space Models** — 结构化状态空间,$O(L)$ 复杂度 +- **RWKV** — 结合 Transformer 和 RNN 优势 + +### 线性注意力 +- **Linear Transformers** — 通过核方法将注意力线性化 +- **Performer** — 使用随机特征近似的线性注意力 + +### 其他 +- **Hyena** — 基于长卷积的序列模型 +- **Griffin** — 门控卷积与线性注意力的混合 + +## 核心权衡 + +| 架构类型 | 复杂度 | 记忆能力 | 并行训练 | +|----------|--------|----------|----------| +| Transformer | $O(L^2)$ | ★★★★★ | ✓ | +| MC-RNN | $O(L)$~$O(L^2)$ | ★★★★ | ✗ | +| SSM/Mamba | $O(L)$ | ★★★☆ | 部分 | +| Linear Attn | $O(L)$ | ★★★ | ✓ | + +## 开放问题 + +- 是否存在一种架构能同时实现 $O(L)$ 复杂度和 Transformer 级别的回忆能力? +- Memory Caching 是否可推广到其他次二次架构? + +## 相关概念 + +- [[memory-caching-rnn]] — Memory Caching 技术 +- [[behrouz-memory-caching-rnn]] — MC 原始论文 diff --git a/concepts/symbolic-regression.md b/concepts/symbolic-regression.md new file mode 100644 index 0000000..926df5f --- /dev/null +++ b/concepts/symbolic-regression.md @@ -0,0 +1,100 @@ +--- +title: "Symbolic Regression" +created: 2026-04-16 +updated: 2026-04-17 +type: concept +tags: [optimization, training, model] +sources: [raw/papers/odrzywolek-eml-universal-operator-2026.md] +--- + +# Symbolic Regression + +**Symbolic regression** is a machine learning technique that discovers explicit mathematical expressions from data, rather than fitting fixed-form models. Unlike traditional regression (which optimizes parameters within a predetermined functional form), symbolic regression searches the space of possible equation structures. + +## Core Problem + +Given data points (xᵢ, yᵢ), find a closed-form expression f such that y ≈ f(x), where f is composed of elementary operations and functions. + +**Key Distinction:** +- Traditional regression: y = β₀ + β₁x + β₂x² (form fixed, optimize β) +- Symbolic regression: Discover that y = sin(2πx) · e^(-x²) from data + +## Traditional Approaches + +### Genetic Programming + +The dominant approach historically: +- **Representation**: Expression trees with heterogeneous nodes (+, -, ×, ÷, sin, exp, etc.) +- **Search**: Evolutionary algorithms (mutations, crossovers) +- **Fitness**: Mean squared error or complexity-penalized metrics +- **Tools**: Eureqa, gplearn, PySR + +**Limitations:** +- Discrete search space (combinatorial explosion) +- Slow convergence for complex expressions +- No gradient information +- Brittle to hyperparameters + +### Sparse Regression (SINDy) + +- Assumes sparse linear combination from a library of candidate functions +- Uses LASSO/sparse optimization +- Faster but limited to linear combinations of basis functions + +## Gradient-Based Approaches + +Recent work enables differentiable symbolic regression: + +### EML Trees (2026) + +[[eml-universal-operator|Odrzywołek's EML representation]] enables gradient-based optimization: +- Uniform tree structure (all nodes are `eml` operators) +- Fully differentiable +- Optimizable with standard deep learning optimizers (Adam) +- Can recover exact closed forms at shallow depths (≤4) + +### Neural Symbolic Methods + +- **AI Feynman**: Combines neural network fitting with symbolic property testing +- **Symbolic GPT**: Transformer-based generation of expressions +- **Deep Symbolic Regression**: Neural networks predicting expression trees + +## Evaluation Metrics + +1. **Accuracy**: R², MSE, NMSE on held-out data +2. **Complexity**: Number of nodes, operators, or description length +3. **Pareto Frontier**: Trade-off between accuracy and simplicity +4. **Exact Recovery**: Whether the true underlying formula is found +5. **Generalization**: Performance on out-of-distribution data + +## Applications + +| Domain | Example | +|--------|---------| +| Physics | Discovering force laws, equations of state | +| Chemistry | Reaction kinetics, structure-property relationships | +| Biology | Population dynamics, gene regulatory networks | +| Engineering | System identification, control laws | +| Finance | Discovering pricing formulas, risk models | + +## Challenges + +1. **Scalability**: Exponential growth of expression space with size +2. **Noise Sensitivity**: Overfitting to data noise +3. **Non-uniqueness**: Multiple expressions may fit data equally well +4. **Dimensional Analysis**: Incorporating physical units/constraints +5. **Interpretability**: Balancing accuracy with human-understandable forms + +## Future Directions + +- Integration with large language models for prior knowledge +- Physics-informed constraints (conservation laws, symmetries) +- Multi-objective optimization (accuracy, simplicity, generalization) +- Real-time/online symbolic regression +- Human-in-the-loop discovery workflows + +## Related Concepts + +- [[eml-universal-operator]]: A universal operator enabling gradient-based symbolic regression +- [[andrzej-odrzywolek]]: Researcher who discovered the EML universal operator +- [[computerized-adaptive-testing]]: CAT 中的动态选题策略与符号回归中的自适应搜索在"探索-利用权衡"上有结构相似性 diff --git a/entities/andrzej-odrzywolek.md b/entities/andrzej-odrzywolek.md new file mode 100644 index 0000000..0f9ff32 --- /dev/null +++ b/entities/andrzej-odrzywolek.md @@ -0,0 +1,63 @@ +--- +title: "Andrzej Odrzywołek" +created: 2026-04-16 +updated: 2026-04-16 +type: entity +tags: [person, research] +sources: [raw/papers/odrzywolek-eml-single-operator-2026.md] +--- + +# Andrzej Odrzywołek + +## 概述 + +波兰理论物理学家,亚捷隆大学(Jagiellonian University)理论物理研究所研究员。 + +## 关键信息 + +- **机构:** Institute of Theoretical Physics, Jagiellonian University, 30-348 Krakow, Poland +- **邮箱:** andrzej.odrzywolek@uj.edu.pl +- **研究领域:** 理论物理、符号计算、符号回归 + +## 主要贡献 + +### EML Sheffer 算子 (2026) +发现了连续数学中的 Sheffer 型算子:$\text{eml}(x,y) = \exp(x) - \ln(y)$,证明单一二元算子配合常数 1 足以生成所有初等函数。这一发现通过系统穷举搜索获得,并通过构造性证明验证了其完备性。 + +### 符号回归方法 +开发了基于 EML 二叉树的符号回归方法,展示了使用梯度优化从数值数据中精确恢复闭式初等函数的可行性。 + +## 工具与代码 + +- **SymbolicRegressionPackage** — Mathematica 符号回归包,含 Rust 高速实现 +- **EML Toolkit** — EML 编译器及相关工具 +- **Zenodo 存档:** DOI: 10.5281/zenodo.19183008 + +## 发表文献 + +1. **All elementary functions from a single binary operator** (2026) + arXiv:2603.21852 [cs.SC] + 分类:符号计算、机器学习 + [PDF](raw/papers/odrzywolek-eml-universal-operator-2026.pdf) + +## 发现方法 + +通过系统穷举搜索发现 EML 算子——这表明通过计算方法寻找基础数学原语是可行的。 + +## 重要意义 + +EML 算子的发现在连续数学中的地位,相当于 NAND 通用性在布尔逻辑中的地位——这是一个基础性结果,对以下领域有深远影响: +- 自动化科学发现 +- 神经-符号 AI 集成 +- 微积分的极简基础 + +## 外部链接 + +- arXiv 主页: https://arxiv.org/search/cs?searchtype=author&query=Odrzywo%C5%82ek,+A +- 代码仓库: https://zenodo.org/records/19183008 + +## 相关页面 + +- [[odrzywolek-eml-single-operator]] — EML 算子论文 +- [[eml-operator]] — 核心数学概念 +- [[symbolic-regression]] — 符号回归技术 diff --git a/entities/papers/tao-klowden-ai-mathematical-methods.md b/entities/papers/tao-klowden-ai-mathematical-methods.md new file mode 100644 index 0000000..bdc69cf --- /dev/null +++ b/entities/papers/tao-klowden-ai-mathematical-methods.md @@ -0,0 +1,72 @@ +--- +title: "Mathematical methods and human thought in the age of AI" +created: 2025-04-15 +updated: 2025-04-15 +type: paper +tags: [paper, ai-philosophy, mathematics, human-centered-ai, llm, deep-learning] +sources: [raw/papers/tao-ai-mathematical-methods-2026.md] +arXiv: "2603.26524" +authors: [[Terence Tao]], [[Tanya Klowden]] +published: 2026-03-27 +--- + +# Mathematical methods and human thought in the age of AI + +作者:[[Terence Tao]], [[Tanya Klowden]] +arXiv: [2603.26524](https://arxiv.org/abs/2603.26524) 发表日期:2026年3月27日 +页数:27页 + +## 摘要 + +人工智能(AI)被通俗地命名为一系列计算机工具,旨在执行越来越复杂的认知任务。本文探讨了 AI 对传统哲学问题的影响,重点关注其在数学中的应用以及更广泛使用的真实世界结果。 + +**核心论点**:AI 是人类历史上为促进思想的创造、组织和传播而发展的工具的自然演进,必须以人类为中心来开发和应用 AI。 + +## 主要章节 + +### 1. 定义与背景 +- AI 被定义为执行复杂认知任务的计算机工具谱系 +- 从 [[LLM]] 和扩散模型到传统的 "GOFAI"(如自动定理证明器、国际象棋引擎) +- 缺乏关于为什么要快速开发和部署这些工具的讨论 + +### 2. 历史类比:这次不同吗? +- 自动化技术并非新现象(如印刷机、计算机、[[LaTeX]]) +- 过去技术主要影响输出的传播,而非创造本身 +- 现代 AI 可以自动化创造过程本身,造成内容外在形式与创造价值观的前所未有的脱钩 + +### 3. 数学作为 AI 使用的 "沙盒" +- 数学具有更成熟的基础,适合探索各种假设性场景 +- [[Frontier AI]] 模型现在可以解决越来越复杂的数学问题 +- AI 可能在某些任务上超越人类专家,同时在基础概念上犯严重错误 + +### 4. 证明标准与 "Smell Test" +- 数学传统上有客观的证明标准,从欧几里得到二十世纪初的基础 +- **"Smell Test"**(气味测试):好的证明不仅展示逻辑推理,还提供理解和洞察 +- [[Formal Verification]](形式化验证)可以验证正确性,但无法捕捉 "penumbra"(启发式、经验式推理) + +### 5. AI 辅助数学的演进 +- 数学社区已适应过之前的技术挑战(如四色定理、开普勒猜想的计算机辅助证明) +- 证明负担将越来越多地转向计算机 +- 人类数学家可能更专注于 "软性"方面:启发式、动机、实验证据 + +## 核心观点 + +1. **AI 是工具的演进**,而非人类的替代 +2. **必须以人类为中心**:创新解决方案满足人类需求、提升生活质量、拓展人类思维能力 +3. **形式化验证的局限**:只能验证形式正确性,无法传达理解与洞察 +4. **"气味测试"的价值**:好的数学不仅是正确的,还是有教育意义和启发性的 + +## 与其他页面的关联 + +- [[human-centered-ai]] - 本文核心主题:以人类为中心的 AI 发展 +- [[formal-verification]] - 形式化验证的作用与局限 +- [[ai-mathematics]] - AI 与数学的交叉 +- [[terence-tao]] - 第二作者,著名数学家 +- [[llm]] - 大语言模型 +- [[alpha-proof]] - 论文提及的 AI 数学证明系统 + +## 关键引用 + +> "AI 是人类历史上为促进思想的创造、组织和传播而发展的工具的自然演进。" + +> "形式化验证只能证明形式化论证建立了形式化数学陈述,但不能排除正式陈述与原始意图陈述之间的翻译错误。" diff --git a/entities/tanya-klowden.md b/entities/tanya-klowden.md new file mode 100644 index 0000000..f633847 --- /dev/null +++ b/entities/tanya-klowden.md @@ -0,0 +1,29 @@ +--- +title: "Tanya Klowden" +created: 2025-04-15 +updated: 2025-04-15 +type: entity +tags: [person, arts, humanities] +sources: [raw/papers/tao-ai-mathematical-methods.md] +--- + +# Tanya Klowden + +**背景**:艺术和人文学科领域 + +**合著论文**: +- [[Mathematical methods and human thought in the age of AI]] (与 [[Terence Tao]] 合著,2026) - 探讨 AI 对哲学、数学和人文学科的影响 + +**研究兴趣**: +- AI 在人文学科中的应用与影响 +- AI 的哲学问题 +- 与科学领域(如数学)的对话与交叉 + +**独特视角**: +作为从艺术和人文学科角度研究 AI 的学者,Klowden 为论文带来了与数学家 [[Terence Tao]] 相互补充的视角,探讨了从艺术到科学的广泛 AI 使用问题。 + +## 关联页面 + +- [[Mathematical methods and human thought in the age of AI]] - 合著论文 +- [[Terence Tao]] - 合著者 +- [[human-centered-ai]] - 论文核心主题 diff --git a/entities/terence-tao.md b/entities/terence-tao.md new file mode 100644 index 0000000..e52678e --- /dev/null +++ b/entities/terence-tao.md @@ -0,0 +1,48 @@ +--- +title: "Terence Tao (陶哲轩)" +created: 2025-04-15 +updated: 2025-04-15 +type: entity +tags: [person, mathematics, analysis, number-theory] +sources: [raw/papers/tao-klowden-ai-mathematical-methods.md] +--- + +# Terence Tao (陶哲轩) + +**出生**:1975年7月17日,阿德莱德,澳大利亚 + +**现任**:加州大学洛杉矶分校数学教授 + +**专长领域**: +- 谐分析(Harmonic Analysis) +- 偏微分方程(PDEs) +- 解析数论(Analytic Number Theory) +- 组合数学 +- 波尔兹曼方程泛函数 + +## 主要成就 + +- **2006年莲莲尔奖** — 表彰对偏微分方程、谐分析、解析数论、表示论和组合数论的贡献 +- 被广泛认为现当代最伟大的数学家之一 +- 参与多个重要数学工作(如 Polymath 项目) + +## 与 AI 的关系 + +陶哲轩是积极探索 AI 工具在数学研究中应用的先驱之一: + +- 主张 AI 是**人类工具的自然演进**,应以人类为中心使用 +- 在论文 [[Mathematical methods and human thought in the age of AI]] 中,提出了 "Smell Test" 概念 +- 强调形式化验证的局限:只能验证正确性,无法传达理解与启发 +- 认为 AI 可以成为数学家的辅助工具,但需要小心使用 + +## 主要论文 + +- [[Mathematical methods and human thought in the age of AI]] (与 [[Tanya Klowden]] 合著,2026) - 关于 AI 与数学哲学的深度思考 +- 数百篇其他数学研究论文 + +## 关联页面 + +- [[Mathematical methods and human thought in the age of AI]] - 关于 AI 与数学的论文 +- [[Tanya Klowden]] - 该论文合著者 +- [[ai-mathematics]] - AI 与数学的交参 +- [[human-centered-ai]] - 以人类为中心的 AI diff --git a/index.md b/index.md new file mode 100644 index 0000000..7cced46 --- /dev/null +++ b/index.md @@ -0,0 +1,49 @@ +# Wiki Index + +> 内容目录。每个 wiki 页面按类型列出,附单行摘要。 +> 首先阅读此文件以查找任何查询的相关页面。 +> 最后更新:2026-04-20 | 总页面数:28 + +## Entities(实体) + +- [[andrzej-odrzywolek]] - 波兰理论物理学家,EML Sheffer 算子发现者 +- [[tanya-klowden]] - 艺术与人文学科学者,与陶哲轩合著 AI 哲学论文 +- [[terence-tao]] - 著名数学家,莲莲尔奖得主,AI 与数学先驱探索者 + +## Concepts(概念) +- [[gravitino-unified-metadata]] — Gravitino 统一元数据管理方案 +- [[curvine-distributed-cache]] — Curvine 云原生分布式缓存系统 +- [[mixture-of-depths-attention]] — MoDA 跨层注意力机制 +- [[depth-scaling-signal-degradation]] — LLM 深度扩展与信号退化问题 +- [[prefill-as-a-service]] — PrfaaS 跨数据中心 LLM 服务架构 +- [[prefill-decode-disaggregation]] — Prefill-Decode 分离架构演进 +- [[kvcache-transfer]] — KVCache 传输与优化技术 +- [[agent-mediated-deception]] — 代理中介欺骗 (AMD) 攻击模式与防御 +- [[human-agent-trust]] — 人机信任建立与脆弱性研究 +- [[memory-caching-rnn]] — 通过缓存 RNN 隐藏状态检查点扩展有效记忆容量的技术 +- [[subquadratic-transformer-alternatives]] — Transformer 的次二次复杂度替代架构综述 +- [[ai-mathematics]] - AI 与数学的交叉研究,以数学为 "沙盒"探索 AI 能力 +- [[eml-operator]] - EML (Exp-Minus-Log) 算子,连续数学中的 Sheffer 算子 +- [[formal-verification]] - 使用形式化方法验证数学证明正确性 +- [[human-centered-ai]] - 以增强人类能力为核心目标的 AI 发展哲学 +- [[computerized-adaptive-testing]] — 计算机化自适应测试综述:ML 方法如何优化测量模型、选题策略、题库构建和测试控制 +- [[cramer-rao-lower-bound]] — 参数估计的理论方差下界,由 Fisher 信息量的倒数给出,是 MLE 和 CAT 的数学基础 +- [[knowledge-bank]] — AI 辅助开发时代的知识管理系统,3D 分类 (scope + source + type) 与自动捕获生命周期 +- [[symbolic-regression]] — 从数据中发现数学表达式的机器学习技术 + +## Articles(文章) +- [[oppo-multimodal-data-lake]] — OPPO 多模态数据湖架构实践 (Gravitino + Curvine) + +## Comparisons(对比) + +## Papers(论文) +- [[zhu-moda-mixture-of-depths]] — MoDA:跨层注意力机制解决深度扩展中的信号退化 (arXiv:2603.15619, 2026) +- [[qin-prfaas-cross-datacenter]] — PrfaaS:跨数据中心 LLM 服务架构,KVCache 可跨集群传输 (arXiv:2604.15039, 2026) +- [[li-amd-human-perception]] — 人类对 LLM Agent 欺骗的感知脆弱性实证研究 (arXiv:2602.21127, 2026) +- [[behrouz-memory-caching-rnn]] — Memory Caching 技术:通过缓存 RNN 隐藏状态实现可增长记忆 (arXiv:2602.24281, 2026) +- [[odrzywolek-eml-single-operator]] - EML 算子:单一二元算子生成所有初等函数 (arXiv:2603.21852, 2026) +- [[Mathematical methods and human thought in the age of AI]] - 陶哲轩与 Klowden 关于 AI 哲学的深度论文 (arXiv:2603.26524, 2026) + +## Books(书籍) + +## Queries(查询) diff --git a/log.md b/log.md new file mode 100644 index 0000000..2739da1 --- /dev/null +++ b/log.md @@ -0,0 +1,111 @@ +# Wiki Log + +> 所有 wiki 操作的按时间顺序记录。仅追加。 +> 格式:`## [YYYY-MM-DD] action | subject` +> 操作类型:ingest, update, query, lint, create, archive, delete +> 当此文件超过 500 条记录时,轮换:重命名为 log-YYYY.md,重新开始。 + +## [2026-04-20] merge | 合并 /home/ubuntu/wiki 到 /home/ubuntu/wikiplace +- 来源:旧 wiki 路径(默认回退路径 ~/wiki) +- 操作:将 wiki 独有的文件合并到 wikiplace +- 新增文件: + - `concepts/computerized-adaptive-testing.md` — CAT 测试综述 + - `concepts/cramer-rao-lower-bound.md` — CRLB 参数估计下界 + - `concepts/knowledge-bank.md` — AI 辅助开发知识管理系统 + - `concepts/symbolic-regression.md` — 符号回归技术 + - `raw/articles/knowledge-bank-ai-dev-2026.md` — Knowledge Bank 微信公众号原文 + - `raw/papers/hbs-cramerrao-bound-notes.md` — HBS CRLB 培训材料摘要 + - `raw/papers/zhuang-catsurvey-ml-2024.md` — CAT 综述论文元数据 + - `raw/papers/cramerrao-bound-notes.pdf` — HBS CRLB 培训 PDF + - `raw/papers/odrzywolek-eml-universal-operator-2026.pdf` — EML 论文 PDF +- 合并更新: + - `concepts/eml-operator.md` — 补充了符号回归联系、布尔逻辑类比、研究意义和更多开放问题 + - `entities/andrzej-odrzywolek.md` — 补充了发表文献、发现方法、重要意义和外部链接 +- 更新 index.md:总页面数 24 → 28 +- 更新 log.md:追加合并记录 + +## [2025-04-15] create | Wiki 初始化 +- 领域:数学研究、AI/ML 研究、编程技术、学习笔记与阅读资料 +- 创建结构:SCHEMA.md, index.md, log.md +- 目录结构:raw/, entities/, concepts/, comparisons/, queries/ + +## [2025-04-15] ingest | Mathematical methods and human thought in the age of AI +- 来源:arXiv:2603.26524 +- 作者:[[Terence Tao]], [[Tanya Klowden]] +- 保存至:raw/papers/tao-ai-mathematical-methods-2026.md +- 创建页面: + - entities/papers/tao-klowden-ai-mathematical-methods.md + - entities/terence-tao.md + - entities/tanya-klowden.md + - concepts/human-centered-ai.md + - concepts/formal-verification.md + - concepts/ai-mathematics.md +- 更新 index.md:总页面数 6 + +## [2026-04-16] ingest | All elementary functions from a single binary operator +- 来源:arXiv:2603.21852 [cs.SC] +- 作者:[[Andrzej Odrzywołek]] +- 保存至:raw/papers/odrzywolek-eml-single-operator-2026.md +- 创建页面: + - papers/odrzywolek-eml-single-operator.md — EML 算子论文摘要 + - entities/andrzej-odrzywolek.md — 作者实体页面 + - concepts/eml-operator.md — EML 算子概念页面 +- 更新 index.md:总页面数 9 +- 关键概念:EML Sheffer 算子、二叉树语法、符号回归、连续数学完备性 + +## [2026-04-19] ingest | Memory Caching: RNNs with Growing Memory +- 来源:arXiv:2602.24281 [cs.LG] +- 作者:Ali Behrouz, Zeman Li, Yuan Deng, Peilin Zhong, Meisam Razaviyayn, Vahab Mirrokni +- 保存至:raw/papers/behrouz-memory-caching-rnn-2026.md +- 创建页面: + - papers/behrouz-memory-caching-rnn.md — MC 论文笔记 + - concepts/memory-caching-rnn.md — Memory Caching 技术详解 + - concepts/subquadratic-transformer-alternatives.md — 次二次 Transformer 替代方案综述 +- 更新 index.md:总页面数 12 +- 关键概念:Memory Caching、RNN 增长记忆、次二次复杂度、隐藏状态缓存、门控聚合 + +## [2026-04-19] ingest | "Are You Sure?": Human Perception Vulnerability in LLM Agents +- 来源:arXiv:2602.21127 [cs.HC] +- 作者:Xinfeng Li, Shenyu Dai, Kelong Zheng, Yue Xiao, Gelei Deng, Wei Dong, Xiaofeng Wang +- 保存至:raw/papers/li-amd-human-perception-2026.md +- 创建页面: + - papers/li-amd-human-perception.md — AMD 实证研究论文笔记 + - concepts/agent-mediated-deception.md — AMD 攻击模式详解 + - concepts/human-agent-trust.md — 人机信任与脆弱性 +- 更新 index.md:总页面数 14 +- 关键概念:Agent-Mediated Deception、HAT-Lab、认知失败模式、经验学习、信任校准 + +## [2026-04-19] ingest | Prefill-as-a-Service: KVCache Goes Cross-Datacenter +- 来源:arXiv:2604.15039 [cs.DC] +- 作者:Ruoyu Qin, Weiran He, Yaoyu Wang, Zheming Li, Xinran Xu, Yongwei Wu, Weimin Zheng, Mingxing Zhang +- 保存至:raw/papers/qin-prfaas-cross-datacenter-2026.md +- 创建页面: + - papers/qin-prfaas-cross-datacenter.md — PrfaaS 论文笔记 + - concepts/prefill-as-a-service.md — PrfaaS 架构详解 + - concepts/prefill-decode-disaggregation.md — PD 分离架构演进 + - concepts/kvcache-transfer.md — KVCache 传输与优化 +- 更新 index.md:总页面数 17 +- 关键概念:Prefill-as-a-Service、跨数据中心部署、KVCache 传输、混合注意力、带宽感知调度 + +## [2026-04-19] ingest | Mixture-of-Depths Attention (MoDA) +- 来源:arXiv:2603.15619 [cs.LG] +- 作者:Lianghui Zhu, Yuxin Fang, Bencheng Liao, Shijie Wang, Tianheng Cheng, Zilong Huang, Chen Chen, Lai Wei, Yutao Zeng, Ya Wang, Yi Lin, Yu Li, Xinggang Wang +- 保存至:raw/papers/zhu-moda-mixture-of-depths-2026.md +- 创建页面: + - papers/zhu-moda-mixture-of-depths.md — MoDA 论文笔记 + - concepts/mixture-of-depths-attention.md — MoDA 机制详解 + - concepts/depth-scaling-signal-degradation.md — 深度扩展与信号退化问题 +- 更新 index.md:总页面数 21 +- 关键概念:Mixture-of-Depths Attention、信号退化、跨层 KV 访问、硬件高效实现、Post-Norm 优势 + +## [2026-04-19] ingest | OPPO 多模态数据湖实践 (WeChat Article) +- 来源:微信公众号文章 (DataFun / Data for AI Meetup) +- 分享人:David (OPPO 大数据架构负责人) +- 链接:https://mp.weixin.qq.com/s/cBaYa04qAIGsxG1hD7ll3w +- 保存至:raw/articles/oppo-multimodal-data-lake-2026.md +- 创建页面: + - articles/oppo-multimodal-data-lake.md — 文章核心架构与成果总结 + - concepts/gravitino-unified-metadata.md — Gravitino 统一元数据管理 + - concepts/curvine-distributed-cache.md — Curvine 分布式缓存系统 +- 更新 index.md:新增 Articles 分区,总页面数 24 +- 关键概念:多模态数据湖、Gravitino 元数据、Curvine 缓存、LanceDB 加速、混合云架构 diff --git a/papers/behrouz-memory-caching-rnn.md b/papers/behrouz-memory-caching-rnn.md new file mode 100644 index 0000000..8acddf3 --- /dev/null +++ b/papers/behrouz-memory-caching-rnn.md @@ -0,0 +1,43 @@ +--- +title: "Memory Caching: RNNs with Growing Memory" +created: 2026-04-19 +updated: 2026-04-19 +type: paper +tags: [llm, architecture, deep-learning] +sources: [raw/papers/behrouz-memory-caching-rnn-2026.md] +--- + +# Memory Caching: RNNs with Growing Memory + +**arXiv:** 2602.24281 [cs.LG] · 2026-02-27 +**作者:** [[Ali Behrouz]], Zeman Li, Yuan Deng, Peilin Zhong, [[Meisam Razaviyayn]], [[Vahab Mirrokni]] + +## 核心贡献 + +提出 **Memory Caching (MC)** 技术,通过缓存 RNN 隐藏状态的检查点,使 RNN 的有效记忆容量随序列长度增长。这一技术在 RNN 的固定记忆 $O(L)$ 和 Transformer 的增长记忆 $O(L^2)$ 之间提供了一个灵活的插值权衡。 + +## 关键发现 + +- RNN 在回忆密集型任务中表现不佳的原因通常归因于**固定大小的记忆** +- MC 通过缓存隐藏状态的检查点,允许 RNN 有效记忆容量增长 +- 提出四种 MC 变体:包括门控聚合和稀疏选择机制 +- 适用于线性和深层记忆模块 +- 实验结果:MC 提升了 RNN 在语言建模和长上下文理解任务上的性能 +- 在上下文回忆任务中,MC 变体表现接近 Transformer,优于当前最优 RNN 模型 + +## 复杂度分析 + +| 模型类型 | 记忆复杂度 | 记忆特性 | +|----------|-----------|----------| +| 传统 RNN | $O(L)$ | 固定大小记忆 | +| Transformer | $O(L^2)$ | 随上下文增长 | +| MC-RNN | $O(L)$ ~ $O(L^2)$ | 可调节的灵活插值 | + +## 相关概念 + +- [[rnn-memory-caching]] — Memory Caching 技术详解 +- [[subquadratic-transformer-alternatives]] — Transformer 的次二次替代方案 + +## 来源 + +- arXiv: https://arxiv.org/abs/2602.24281 diff --git a/papers/li-amd-human-perception.md b/papers/li-amd-human-perception.md new file mode 100644 index 0000000..b472753 --- /dev/null +++ b/papers/li-amd-human-perception.md @@ -0,0 +1,36 @@ +--- +title: ""Are You Sure?": Human Perception Vulnerability in LLM Agents" +created: 2026-04-19 +updated: 2026-04-19 +type: paper +tags: [llm, alignment, benchmark, research] +sources: [raw/papers/li-amd-human-perception-2026.md] +--- + +# "Are You Sure?": Human Perception Vulnerability in LLM Agents + +**arXiv:** 2602.21127 [cs.HC] · 2026-02-24 +**作者:** Xinfeng Li, Shenyu Dai, Kelong Zheng, Yue Xiao, Gelei Deng, Wei Dong, Xiaofeng Wang + +## 核心贡献 + +首次大规模实证研究(303名参与者)揭示了人类对**代理中介欺骗(Agent-Mediated Deception, AMD)**的脆弱性。当 LLM Agent 被攻破或劫持后,它可能成为攻击用户的武器,而人类对此类欺骗的感知率极低(仅 8.6%)。 + +## 关键发现 + +- **AMD 定义**: compromised agents 被武器化对抗其人类用户 +- **感知率极低**:仅 8.6% 的参与者能察觉到 AMD 攻击 +- **领域专家更易受骗**:在某些场景中,领域专家表现出更高的易感性(可能源于过度信任工具) +- **6 种认知失败模式**:识别了用户在面对欺骗性 Agent 时的认知失效路径 +- **意识-行为鸿沟**:风险意识往往无法转化为保护性行为 +- **有效防御特征**:有效的警告应**中断工作流**且具有**低验证成本** +- **经验学习有效**:基于 HAT-Lab 的体验学习后,>90% 感知到风险的用户报告了对 AMD 的警惕性提高 + +## 研究平台:HAT-Lab + +开发了 **HAT-Lab (Human-Agent Trust Laboratory)**,一个高保真研究平台,包含 9 个精心设计的场景,覆盖日常和专业领域(医疗、软件开发、人力资源等)。 + +## 相关概念 + +- [[agent-mediated-deception]] — AMD 攻击模式与防御 +- [[human-agent-trust]] — 人机信任与脆弱性研究 diff --git a/papers/odrzywolek-eml-single-operator.md b/papers/odrzywolek-eml-single-operator.md new file mode 100644 index 0000000..1d6d407 --- /dev/null +++ b/papers/odrzywolek-eml-single-operator.md @@ -0,0 +1,89 @@ +--- +title: "All elementary functions from a single binary operator" +created: 2026-04-16 +updated: 2026-04-16 +type: paper +tags: [paper, algorithm, concept] +sources: [raw/papers/odrzywolek-eml-single-operator-2026.md] +--- + +# All elementary functions from a single binary operator + +**arXiv:** [2603.21852](https://arxiv.org/abs/2603.21852) [cs.SC] +**作者:** [[andrzej-odrzywolek]] +**发表日期:** 2026-03-23 (v1), 2026-04-04 (v2) + +## 核心贡献 + +本文发现了**连续数学中的 Sheffer 算子**:单一二元算子 + +$$\text{eml}(x,y) = \exp(x) - \ln(y)$$ + +配合常数 $1$,足以生成科学计算器的所有初等函数。这类似于数字电路中 NAND 门对所有布尔逻辑的完备性。 + +## 关键结果 + +### EML 完备性 +- **两按钮计算器** (1, eml) 可替代 36 按钮科学计算器 +- 生成所有算术运算 ($+,-,\times,/$)、超越函数 ($\sin,\cos,\log,\exp$)、常数 ($e,\pi,i$) +- 例如:$\exp(x) = \text{eml}(x,1)$,$\ln(x) = \text{eml}(1,\text{eml}(\text{eml}(1,x),1))$ + +### 二叉树语法 +每个 EML 表达式是同质节点的二叉树,语法极简: + +$$S \to 1 \mid \text{eml}(S,S)$$ + +这与满二叉树和 Catalan 结构同构。 + +### 符号回归 +- EML 树可作为可训练电路,用 Adam 等优化器进行梯度优化 +- 在树深 ≤4 时,可从数值数据中精确恢复闭式初等函数 +- 成功率:深度 2 为 100%,深度 3-4 约 25%,深度 5 <1% + +## 约化历程 + +| 配置 | 常量 | 一元 | 二元 | 计数 | +|------|------|------|------|------| +| Base-36 | 8 | 20 | 8 | 36 | +| Wolfram | $\pi,e,i$ | $\ln$ | $+,\times,\wedge$ | 7 | +| Calc 3 | none | $\exp,\ln,-x,1/x$ | $+$ | 6 | +| Calc 2 | none | $\exp,\ln$ | $-$ | 4 | +| Calc 1 | $e$ 或 $\pi$ | none | $x^y,\log_x y$ | 4 | +| Calc 0 | none | $\exp$ | $\log_x y$ | 3 | +| **EML** | **1** | **none** | **eml** | **2** | + +## 相关算子 + +$$\begin{align} +\text{eml}(x,y) &= \exp(x) - \ln(y) & \text{常量 } 1 \\ +\text{edl}(x,y) &= \exp(x) / \ln(y) & \text{常量 } e \\ +-\text{eml}(y,x) &= \ln(x) - \exp(y) & \text{常量 } -\infty +\end{align}$$ + +## 复杂度示例 + +| 函数 | EML 编译器 | 直接搜索 | +|------|-----------|---------| +| $e^x$ | 3 | 3 | +| $\ln x$ | 7 | 7 | +| $x+y$ | 27 | 19 | +| $x\times y$ | 41 | 17 | +| $\pi$ | 193 | >53 | + +## 应用方向 + +1. **EML 编译器** — 将公式编译为纯 EML 形式 +2. **模拟电路** — EML 作为模拟计算的基本构建块 +3. **符号回归** — 基于梯度优化的"主公式"方法 +4. **神经网络可解释性** — 训练权重可"吸附"到精确符号值 + +## 开放问题 + +- 是否存在不需要区分常量的二元 Sheffer 算子? +- 是否存在同时作为神经激活函数和初等函数生成器的一元 Sheffer 算子? +- 是否存在具有更好性质(非指数渐近、无定义域问题)的类似算子? + +## 相关页面 + +- [[andrzej-odrzywolek]] — 作者 +- [[eml-operator]] — 核心数学概念 diff --git a/papers/qin-prfaas-cross-datacenter.md b/papers/qin-prfaas-cross-datacenter.md new file mode 100644 index 0000000..7e2311e --- /dev/null +++ b/papers/qin-prfaas-cross-datacenter.md @@ -0,0 +1,38 @@ +--- +title: "Prefill-as-a-Service: KVCache Goes Cross-Datacenter" +created: 2026-04-19 +updated: 2026-04-19 +type: paper +tags: [inference, architecture, system-design, llm] +sources: [raw/papers/qin-prfaas-cross-datacenter-2026.md] +--- + +# Prefill-as-a-Service: KVCache Goes Cross-Datacenter + +**arXiv:** 2604.15039 [cs.DC] · 2026-04-16 +**作者:** Ruoyu Qin, Weiran He, Yaoyu Wang, Zheming Li, Xinran Xu, Yongwei Wu, Weimin Zheng, Mingxing Zhang + +## 核心贡献 + +提出 **Prefill-as-a-Service (PrfaaS)**,一种跨数据中心的 LLM 服务架构。通过选择性地将长上下文 prefill 卸载到独立的计算密集型集群,并通过商用以太网将 KVCache 传输到本地 PD 集群进行 decode,实现了 prefill 和 decode 容量的独立扩展。 + +## 关键发现 + +- **传统 PD 分离的局限**:dense-attention 模型产生巨大的 KVCache 流量,迫使 prefill 和 decode 紧耦合在同一高带宽网络域内 +- **混合注意力架构的机遇**:大幅减少 KVCache 大小,使跨集群 KVCache 传输变得可行 +- **仅减少 KVCache 不足**:真实负载突发、请求长度高度偏斜、前缀缓存分布不均、跨集群带宽波动 +- **PrfaaS 设计**: + - 选择性卸载长上下文 prefill 到独立集群 + - 通过商用以太网传输 KVCache + - 结合模型侧 KV 效率与系统侧选择性卸载、带宽感知调度和缓存感知请求放置 + - 消除对低延迟 RDMA fabric 的依赖 +- **性能提升**(基于内部 1T 参数混合模型的案例研究): + - 比同构 PD 部署吞吐量提高 **54%** + - 比朴素异构基线吞吐量提高 **32%** + - 仅消耗适度的跨数据中心带宽 + +## 相关概念 + +- [[prefill-as-a-service]] — PrfaaS 架构详解 +- [[prefill-decode-disaggregation]] — PD 分离架构演进 +- [[kvcache-transfer]] — KVCache 传输与优化 diff --git a/papers/zhu-moda-mixture-of-depths.md b/papers/zhu-moda-mixture-of-depths.md new file mode 100644 index 0000000..f27ba63 --- /dev/null +++ b/papers/zhu-moda-mixture-of-depths.md @@ -0,0 +1,39 @@ +--- +title: "Mixture-of-Depths Attention (MoDA)" +created: 2026-04-19 +updated: 2026-04-19 +type: paper +tags: [llm, architecture, deep-learning, transformer] +sources: [raw/papers/zhu-moda-mixture-of-depths-2026.md] +--- + +# Mixture-of-Depths Attention (MoDA) + +**arXiv:** 2603.15619 [cs.LG] · 2026-03-26 +**作者:** Lianghui Zhu, Yuxin Fang, Bencheng Liao, Shijie Wang, Tianheng Cheng, Zilong Huang, Chen Chen, Lai Wei, Yutao Zeng, Ya Wang, Yi Lin, Yu Li, Xinggang Wang +**代码:** https://github.com/hustvl/MoDA + +## 核心贡献 + +提出 **Mixture-of-Depths Attention (MoDA)**,一种解决大模型深度扩展中**信号退化 (Signal Degradation)** 问题的注意力机制。MoDA 允许每个注意力头同时关注当前层的序列 KV 对和前序层的深度 KV 对,从而在深层网络中保留浅层形成的信息特征。 + +## 关键发现 + +- **信号退化问题**:随着 LLM 变深,浅层形成的信息特征在多次残差更新中被稀释,深层难以恢复 +- **MoDA 机制**: + - 每个注意力头混合关注:当前层序列 KV + 前序层深度 KV + - 类似于跨层的"快捷通道",但基于注意力机制而非简单残差连接 +- **硬件高效实现**: + - 解决了非连续内存访问模式问题 + - 在 64K 序列长度下达到 FlashAttention-2 **97.3%** 的效率 + - 仅增加 **3.7%** 的 FLOPs 计算开销 +- **实验结果**(1.5B 参数模型): + - 平均困惑度 (Perplexity) 在 10 个验证基准上改善 **0.2** + - 10 个下游任务平均性能提升 **2.11%** +- **归一化位置**:MoDA + **Post-Norm** 表现优于 Pre-Norm + +## 相关概念 + +- [[mixture-of-depths-attention]] — MoDA 机制详解 +- [[depth-scaling-llms]] — LLM 深度扩展技术与挑战 +- [[signal-degradation]] — 深层网络中的信号退化问题 diff --git a/raw/articles/knowledge-bank-ai-dev-2026.md b/raw/articles/knowledge-bank-ai-dev-2026.md new file mode 100644 index 0000000..22f4a8c --- /dev/null +++ b/raw/articles/knowledge-bank-ai-dev-2026.md @@ -0,0 +1,41 @@ +# Knowledge Bank: AI 辅助开发时代的知识管理系统 + +**来源:** 微信公众号文章 +**链接:** https://mp.weixin.qq.com/s/lVn1oqo1ciIlVUoqJA0Hpg +**项目仓库:** https://github.com/gabrywu-public/knowledge-bank +**抓取日期:** 2026-04-16 + +## 核心概述 + +Knowledge Bank 是一个面向 AI 辅助开发时代的知识管理系统,通过自动捕获、结构化存储和智能检索,让开发团队的知识真正流动起来。 + +## 三大核心洞察 + +### 转变一:知识受众从"人"变为"机器" +- 真正的知识消费者是 AI 代码助手(Claude Code、Cursor、Copilot) +- 知识需要结构化、情境化、可检索的格式,而非精美排版 + +### 转变二:知识分类从"主题"变为"作用域+来源" +三维分类体系: +1. **作用域 (Scope)**: 个人 / 项目 / 组织 +2. **来源 (Source)**: AI 观察(最高权重)> 架构师决策 > Reviewer 偏好 > 开发者经验 +3. **类型 (Type)**: 代码模式 / 架构决策 / 配置偏好 / 陷阱警示 / API 用法 + +### 转变三:知识生命周期从"写作-阅读"变为"捕获-检索-应用-收集" +- 零摩擦捕获:开发过程中自动提取 +- 情境化检索:需要时主动注入 +- 智能去重:相似度评分自动合并 +- 持续进化:随项目发展自动更新 + +## 技术架构 + +1. **上下文隔离架构 (Fork Context)**: 知识注入和知识收集在分叉的隔离环境中执行,不干扰主会话 +2. **强制仓库关联 (Repository-Aware)**: 所有知识和会话必须关联到 Git 仓库 +3. **智能去重系统**: 多维度相似度评分(标题 40% + 摘要 30% + 内容 20% + 上下文 10%) +4. **完整会话追踪**: 记录每次开发会话的完整上下文 + +## 关键概念 + +- 知识不是静态资产,而是动态上下文 +- 从"被动查询"变为"主动注入" +- 从"散落的金子"转变为"生长的枝干" diff --git a/raw/articles/oppo-multimodal-data-lake-2026.md b/raw/articles/oppo-multimodal-data-lake-2026.md new file mode 100644 index 0000000..fd02572 --- /dev/null +++ b/raw/articles/oppo-multimodal-data-lake-2026.md @@ -0,0 +1,20 @@ +--- +title: "OPPO 多模态数据湖实践:Gravitino 统一元数据与 Curvine 加速" +source_url: "https://mp.weixin.qq.com/s/cBaYa04qAIGsxG1hD7ll3w" +author: "David (OPPO 大数据架构负责人)" +published: "2026-04-19" +retrieved: "2026-04-19" +source_type: weixin_article +speaker: "David" +event: "Data for AI Meetup (深圳站)" +organization: "OPPO" +--- + +# OPPO 多模态数据湖实践:Gravitino 统一元数据与 Curvine 加速 + +**来源:** DataFun / Data for AI Meetup (深圳站) +**分享嘉宾:** David (OPPO 大数据架构负责人) +**原文链接:** https://mp.weixin.qq.com/s/cBaYa04qAIGsxG1hD7ll3w + +## 摘要 +本文介绍了 OPPO 在多模态数据湖建设中的技术选型与落地经验。面对手机影像、多模态推荐搜索及端侧 AI Agent 带来的数据爆发,OPPO 引入 Gravitino 统一元数据管理,并自研云原生分布式缓存 Curvine,构建了统一的存储、管理与查询架构。文章详细解读了该架构如何解决数据孤岛、元数据混乱和云上 IO 性能瓶颈等实际问题。 diff --git a/raw/papers/behrouz-memory-caching-rnn-2026.md b/raw/papers/behrouz-memory-caching-rnn-2026.md new file mode 100644 index 0000000..09e1c5e --- /dev/null +++ b/raw/papers/behrouz-memory-caching-rnn-2026.md @@ -0,0 +1,23 @@ +--- +title: "Memory Caching: RNNs with Growing Memory" +arxiv_id: "2602.24281" +authors: ["Ali Behrouz", "Zeman Li", "Yuan Deng", "Peilin Zhong", "Meisam Razaviyayn", "Vahab Mirrokni"] +published: "2026-02-27" +updated: "2026-02-27" +categories: ["cs.LG", "cs.AI"] +primary_category: "cs.LG" +pdf_path: "behrouz-memory-caching-rnn-2026.pdf" +url: "https://arxiv.org/abs/2602.24281" +abstract: | + Transformers have been established as the de-facto backbones for most recent advances in sequence modeling, mainly due to their growing memory capacity that scales with the context length. While plausible for retrieval tasks, it causes quadratic complexity and so has motivated recent studies to explore viable subquadratic recurrent alternatives. Despite showing promising preliminary results in diverse domains, such recurrent architectures underperform Transformers in recall-intensive tasks, often attributed to their fixed-size memory. In this paper, we introduce Memory Caching (MC), a simple yet effective technique that enhances recurrent models by caching checkpoints of their memory states (a.k.a. hidden states). Memory Caching allows the effective memory capacity of RNNs to grow with sequence length, offering a flexible trade-off that interpolates between the fixed memory (i.e., O(L) complexity) of RNNs and the growing memory (i.e., O(L^2) complexity) of Transformers. We propose four variants of MC, including gated aggregation and sparse selective mechanisms, and discuss their implications on both linear and deep memory modules. Our experimental results on language modeling, and long-context understanding tasks show that MC enhances the performance of recurrent models, supporting its effectiveness. The results of in-context recall tasks indicate that while Transformers achieve the best accuracy, our MC variants show competitive performance, close the gap with Transformers, and performs better than state-of-the-art recurrent models. +--- + +# Memory Caching: RNNs with Growing Memory + +**arXiv:** 2602.24281 [cs.LG] +**Published:** 2026-02-27 +**Authors:** Ali Behrouz, Zeman Li, Yuan Deng, Peilin Zhong, Meisam Razaviyayn, Vahab Mirrokni + +## Abstract + +Transformers have been established as the de-facto backbones for most recent advances in sequence modeling, mainly due to their growing memory capacity that scales with the context length. While plausible for retrieval tasks, it causes quadratic complexity and so has motivated recent studies to explore viable subquadratic recurrent alternatives. Despite showing promising preliminary results in diverse domains, such recurrent architectures underperform Transformers in recall-intensive tasks, often attributed to their fixed-size memory. In this paper, we introduce Memory Caching (MC), a simple yet effective technique that enhances recurrent models by caching checkpoints of their memory states (a.k.a. hidden states). Memory Caching allows the effective memory capacity of RNNs to grow with sequence length, offering a flexible trade-off that interpolates between the fixed memory (i.e., O(L) complexity) of RNNs and the growing memory (i.e., O(L^2) complexity) of Transformers. We propose four variants of MC, including gated aggregation and sparse selective mechanisms, and discuss their implications on both linear and deep memory modules. Our experimental results on language modeling, and long-context understanding tasks show that MC enhances the performance of recurrent models, supporting its effectiveness. The results of in-context recall tasks indicate that while Transformers achieve the best accuracy, our MC variants show competitive performance, close the gap with Transformers, and performs better than state-of-the-art recurrent models. diff --git a/raw/papers/cramerrao-bound-notes.pdf b/raw/papers/cramerrao-bound-notes.pdf new file mode 100644 index 0000000..b3f1e10 Binary files /dev/null and b/raw/papers/cramerrao-bound-notes.pdf differ diff --git a/raw/papers/hbs-cramerrao-bound-notes.md b/raw/papers/hbs-cramerrao-bound-notes.md new file mode 100644 index 0000000..d458a37 --- /dev/null +++ b/raw/papers/hbs-cramerrao-bound-notes.md @@ -0,0 +1,25 @@ +# The Cramer-Rao Lower Bound – Derivation and Examples + +**Source:** HBS Research Computing Services Training Material +**URL:** https://www.hbs.edu/research-computing-services/Shared%20Documents/Training/cramerrao.pdf + +## Content Summary +This document provides a step-by-step derivation and examples of the Cramer-Rao Lower Bound (CRLB) using the normal and binomial distributions. It covers the following concepts: +- **The Score:** Derivative of the log-likelihood function, viewed as a random variable. +- **Expectation of the Score:** Proven to be 0. +- **Fisher Information:** Expectation of the square of the score (or variance of the score), representing the "information" the data provides about the parameter. +- **Cramer-Rao Bound:** The minimum possible variance for any unbiased estimator is $1/I$, where $I$ is the Fisher Information. +- **Alternative Expression for Fisher Information:** $I(\theta) = -E[\frac{\partial^2}{\partial \theta^2} \log f(x|\theta)]$, connecting information to the curvature of the log-likelihood. +- **Observed vs. Expected Information:** Expected information uses the true parameter and expectation over all data; observed information uses the estimated parameter and actual data. +- **Information Matrix:** Extension to multiple parameters. +- **Connection to Maximum Likelihood Estimation (MLE):** MLE is asymptotically efficient, meaning its variance reaches the CRLB as sample size grows. + +## Examples Detailed +1. **Normal Distribution:** + - Score: $g(\mu) = \frac{n}{\sigma^2}(\bar{x} - \mu)$ + - Fisher Information: $I = \frac{n}{\sigma^2}$ + - CRLB: $\frac{\sigma^2}{n}$, which matches the variance of the sample mean $\bar{x}$. +2. **Binomial Distribution:** + - Score: $g(\pi) = \frac{k}{\pi} - \frac{n-k}{1-\pi}$ + - Fisher Information: $I = \frac{n}{\pi(1-\pi)}$ + - CRLB: $\frac{\pi(1-\pi)}{n}$, matching the variance of the sample proportion $k/n$. diff --git a/raw/papers/li-amd-human-perception-2026.md b/raw/papers/li-amd-human-perception-2026.md new file mode 100644 index 0000000..20990c6 --- /dev/null +++ b/raw/papers/li-amd-human-perception-2026.md @@ -0,0 +1,22 @@ +--- +title: ""Are You Sure?": An Empirical Study of Human Perception Vulnerability in LLM-Driven Agentic Systems" +arxiv_id: "2602.21127" +authors: ["Xinfeng Li", "Shenyu Dai", "Kelong Zheng", "Yue Xiao", "Gelei Deng", "Wei Dong", "Xiaofeng Wang"] +published: "2026-02-24" +updated: "2026-02-24" +categories: ["cs.HC", "cs.AI", "cs.CR", "cs.SI"] +primary_category: "cs.HC" +url: "https://arxiv.org/abs/2602.21127" +abstract: | + Large language model (LLM) agents are rapidly becoming trusted copilots in high-stakes domains like software development and healthcare. However, this deepening trust introduces a novel attack surface: Agent-Mediated Deception (AMD), where compromised agents are weaponized against their human users. While extensive research focuses on agent-centric threats, human susceptibility to deception by a compromised agent remains unexplored. We present the first large-scale empirical study with 303 participants to measure human susceptibility to AMD. This is based on HAT-Lab (Human-Agent Trust Laboratory), a high-fidelity research platform we develop, featuring nine carefully crafted scenarios spanning everyday and professional domains (e.g., healthcare, software development, human resources). Our 10 key findings reveal significant vulnerabilities and provide future defense perspectives. Specifically, only 8.6% of participants perceive AMD attacks, while domain experts show increased susceptibility in certain scenarios. We identify six cognitive failure modes in users and find that their risk awareness often fails to translate to protective behavior. The defense analysis reveals that effective warnings should interrupt workflows with low verification costs. With experiential learning based on HAT-Lab, over 90% of users who perceive risks report increased caution against AMD. This work provides empirical evidence and a platform for human-centric agent security research. +--- + +# "Are You Sure?": An Empirical Study of Human Perception Vulnerability in LLM-Driven Agentic Systems + +**arXiv:** 2602.21127 [cs.HC] +**Published:** 2026-02-24 +**Authors:** Xinfeng Li, Shenyu Dai, Kelong Zheng, Yue Xiao, Gelei Deng, Wei Dong, Xiaofeng Wang + +## Abstract + +Large language model (LLM) agents are rapidly becoming trusted copilots in high-stakes domains like software development and healthcare. However, this deepening trust introduces a novel attack surface: Agent-Mediated Deception (AMD), where compromised agents are weaponized against their human users. While extensive research focuses on agent-centric threats, human susceptibility to deception by a compromised agent remains unexplored. We present the first large-scale empirical study with 303 participants to measure human susceptibility to AMD. This is based on HAT-Lab (Human-Agent Trust Laboratory), a high-fidelity research platform we develop, featuring nine carefully crafted scenarios spanning everyday and professional domains (e.g., healthcare, software development, human resources). Our 10 key findings reveal significant vulnerabilities and provide future defense perspectives. Specifically, only 8.6% of participants perceive AMD attacks, while domain experts show increased susceptibility in certain scenarios. We identify six cognitive failure modes in users and find that their risk awareness often fails to translate to protective behavior. The defense analysis reveals that effective warnings should interrupt workflows with low verification costs. With experiential learning based on HAT-Lab, over 90% of users who perceive risks report increased caution against AMD. This work provides empirical evidence and a platform for human-centric agent security research. diff --git a/raw/papers/odrzywolek-eml-single-operator-2026.md b/raw/papers/odrzywolek-eml-single-operator-2026.md new file mode 100644 index 0000000..af8a64b --- /dev/null +++ b/raw/papers/odrzywolek-eml-single-operator-2026.md @@ -0,0 +1,52 @@ +# All elementary functions from a single binary operator + +**arXiv:** 2603.21852 [cs.SC] +**Authors:** Andrzej Odrzywołek +**Published:** 2026-03-23 (v1), revised 2026-04-04 (v2) +**URL:** https://arxiv.org/abs/2603.21852 + +## Abstract + +A single two-input gate suffices for all of Boolean logic in digital hardware. No comparable primitive has been known for continuous mathematics: computing elementary functions such as sin, cos, sqrt, and log has always required multiple distinct operations. Here I show that a single binary operator, eml(x,y)=exp(x)-ln(y), together with the constant 1, generates the standard repertoire of a scientific calculator. This includes constants such as e, pi, and i; arithmetic operations including addition, subtraction, multiplication, division, and exponentiation as well as the usual transcendental and algebraic functions. For example, exp(x)=eml(x,1), ln(x)=eml(1,eml(eml(1,x),1)), and likewise for all other operations. That such an operator exists was not anticipated; I found it by systematic exhaustive search and established constructively that it suffices for the concrete scientific-calculator basis. In EML (Exp-Minus-Log) form, every such expression becomes a binary tree of identical nodes, yielding a grammar as simple as S -> 1 | eml(S,S). This uniform structure also enables gradient-based symbolic regression: using EML trees as trainable circuits with standard optimizers (Adam), I demonstrate the feasibility of exact recovery of closed-form elementary functions from numerical data at shallow tree depths up to 4. The same architecture can fit arbitrary data, but when the generating law is elementary, it may recover the exact formula. + +## Key Points + +1. **EML Operator:** eml(x,y) = exp(x) - ln(y) is a single binary operator that, together with constant 1, can generate all elementary functions +2. **Scientific Calculator Reduction:** A two-button calculator (1, eml) suffices for everything a full 36-button scientific calculator can do +3. **Binary Tree Grammar:** Every EML expression is a binary tree with grammar S → 1 | eml(S,S) +4. **Symbolic Regression:** EML trees can be trained with gradient methods to recover exact closed-form expressions from data +5. **Discovery Method:** Found through systematic exhaustive search and ablation testing +6. **Related Operators:** EDL (exp(x)/ln(y) with constant e) and -eml(y,x) (with constant -∞) are related variants + +## Methods + +- Started with 36 primitives (constants, functions, operations) from standard scientific calculator +- Iteratively removed elements and verified if remaining set could reconstruct all originals +- Used hybrid numeric bootstrapping verification with algebraically independent transcendental constants +- Search complexity up to K=9 (RPN program length) +- Verified with Mathematica SymbolicRegression package and Rust implementation + +## Results + +- Progressive reduction: Base-36 → Wolfram (7) → Calc 3 (6) → Calc 2 (4) → Calc 1 (4) → Calc 0 (3) → EML (2) +- EML expression depths range from 1 (exp) to 8 (multiplication) +- Constants: e (depth 3), π (depth 193), i (depth 131) +- Symbolic recovery success rate: 100% at depth 2, ~25% at depths 3-4, <1% at depth 5 + +## Applications + +1. **EML Compiler:** Converts formulas to pure EML form for symbolic/numerical evaluation +2. **Analog Circuits:** EML as building block for analog computing +3. **Symbolic Regression:** Master formula approach with gradient-based optimization +4. **Neural Networks:** EML trees as interpretable architectures + +## Open Questions + +-是否存在不需要区分常量的二元 Sheffer 算子? +-是否存在一元 Sheffer 算子,同时作为神经激活函数和初等函数生成器? +-是否存在具有更好性质(非指数渐近、无域问题)的类似算子? + +## Code & Data + +- Repository: https://zenodo.org/records/19183008 +- SymbolicRegressionPackage with Mathematica and Rust implementations diff --git a/raw/papers/odrzywolek-eml-universal-operator-2026.pdf b/raw/papers/odrzywolek-eml-universal-operator-2026.pdf new file mode 100644 index 0000000..33c2a59 Binary files /dev/null and b/raw/papers/odrzywolek-eml-universal-operator-2026.pdf differ diff --git a/raw/papers/qin-prfaas-cross-datacenter-2026.md b/raw/papers/qin-prfaas-cross-datacenter-2026.md new file mode 100644 index 0000000..40cfc5e --- /dev/null +++ b/raw/papers/qin-prfaas-cross-datacenter-2026.md @@ -0,0 +1,24 @@ +--- +title: "Prefill-as-a-Service: KVCache of Next-Generation Models Could Go Cross-Datacenter" +arxiv_id: "2604.15039" +authors: ["Ruoyu Qin", "Weiran He", "Yaoyu Wang", "Zheming Li", "Xinran Xu", "Yongwei Wu", "Weimin Zheng", "Mingxing Zhang"] +published: "2026-04-16" +updated: "2026-04-16" +categories: ["cs.DC"] +primary_category: "cs.DC" +url: "https://arxiv.org/abs/2604.15039" +abstract: | + Prefill-decode (PD) disaggregation has become the standard architecture for large-scale LLM serving, but in practice its deployment boundary is still determined by KVCache transfer. In conventional dense-attention models, prefill generates huge KVCache traffics that keep prefill and decode tightly coupled within a single high-bandwidth network domain, limiting heterogeneous deployment and resource elasticity. Recent hybrid-attention architectures substantially reduce KVCache size, making cross-cluster KVCache transport increasingly plausible. However, smaller KVCache alone does not make heterogeneous cross-datacenter PD serving practical: real workloads remain bursty, request lengths are highly skewed, prefix caches are unevenly distributed, and inter-cluster bandwidth fluctuates. A naive design that fully externalizes prefill can therefore still suffer from congestion, unstable queueing, and poor utilization. + We present Prefill-as-a-Service (PrfaaS), a cross-datacenter serving architecture that selectively offloads long-context prefill to standalone, compute-dense prefill clusters and transfers the resulting KVCache over commodity Ethernet to local PD clusters for decode. Rather than treating reduced KVCache as sufficient, PrfaaS combines model-side KV efficiency with system-side selective offloading, bandwidth-aware scheduling, and cache-aware request placement. This design removes the requirement that heterogeneous accelerators share the same low-latency RDMA fabric, enabling independent scaling of prefill and decode capacity across loosely coupled clusters. In a case study using an internal 1T-parameter hybrid model, a PrfaaS-augmented heterogeneous deployment achieves 54% and 32% higher serving throughput than homogeneous PD and naive heterogeneous baselines, respectively, while consuming only modest cross-datacenter bandwidth. +--- + +# Prefill-as-a-Service: KVCache of Next-Generation Models Could Go Cross-Datacenter + +**arXiv:** 2604.15039 [cs.DC] +**Published:** 2026-04-16 +**Authors:** Ruoyu Qin, Weiran He, Yaoyu Wang, Zheming Li, Xinran Xu, Yongwei Wu, Weimin Zheng, Mingxing Zhang + +## Abstract + +Prefill-decode (PD) disaggregation has become the standard architecture for large-scale LLM serving, but in practice its deployment boundary is still determined by KVCache transfer. In conventional dense-attention models, prefill generates huge KVCache traffics that keep prefill and decode tightly coupled within a single high-bandwidth network domain, limiting heterogeneous deployment and resource elasticity. Recent hybrid-attention architectures substantially reduce KVCache size, making cross-cluster KVCache transport increasingly plausible. However, smaller KVCache alone does not make heterogeneous cross-datacenter PD serving practical: real workloads remain bursty, request lengths are highly skewed, prefix caches are unevenly distributed, and inter-cluster bandwidth fluctuates. A naive design that fully externalizes prefill can therefore still suffer from congestion, unstable queueing, and poor utilization. + We present Prefill-as-a-Service (PrfaaS), a cross-datacenter serving architecture that selectively offloads long-context prefill to standalone, compute-dense prefill clusters and transfers the resulting KVCache over commodity Ethernet to local PD clusters for decode. Rather than treating reduced KVCache as sufficient, PrfaaS combines model-side KV efficiency with system-side selective offloading, bandwidth-aware scheduling, and cache-aware request placement. This design removes the requirement that heterogeneous accelerators share the same low-latency RDMA fabric, enabling independent scaling of prefill and decode capacity across loosely coupled clusters. In a case study using an internal 1T-parameter hybrid model, a PrfaaS-augmented heterogeneous deployment achieves 54% and 32% higher serving throughput than homogeneous PD and naive heterogeneous baselines, respectively, while consuming only modest cross-datacenter bandwidth. diff --git a/raw/papers/tao-ai-mathematical-methods-2026.md b/raw/papers/tao-ai-mathematical-methods-2026.md new file mode 100644 index 0000000..10a30c2 --- /dev/null +++ b/raw/papers/tao-ai-mathematical-methods-2026.md @@ -0,0 +1,155 @@ +Title: Mathematical methods and human thought in the age of AI + +URL Source: http://arxiv.org/pdf/2603.26524 + +Published Time: Mon, 30 Mar 2026 01:09:37 GMT + +Number of Pages: 27 + +Markdown Content: +MATHEMATICAL METHODS AND HUMAN THOUGHT IN THE AGE OF AI + +TANYA KLOWDEN AND TERENCE TAO + +Abstract. Artificial intelligence (AI) is the name popularly given to a broad spectrum of computer tools designed to perform increas-ingly complex cognitive tasks, including many that used to solely be the province of humans. As these tools become exponentially sophisticated and pervasive, the justifications for their rapid devel-opment and integration into society are frequently called into ques-tion, particularly as they consume finite resources and pose existen-tial risks to the livelihoods of those skilled individuals they appear to replace. In this paper, we consider the rapidly evolving impact of AI to the traditional questions of philosophy with an emphasis on its application in mathematics and on the broader real-world outcomes of its more general use. We assert that artificial intelli-gence is a natural evolution of human tools developed throughout history to facilitate the creation, organization, and dissemination of ideas, and argue that it is paramount that the development and application of AI remain fundamentally human-centered. With an eye toward innovating solutions to meet human needs, enhancing the human quality of life and expanding the capacity for human thought and understanding, we propose a pathway to integrating AI into our most challenging and intellectually rigorous fields to the benefit of all humankind. + +1. Introduction + +It is a testament to how quickly artificial intelligence (AI) technolo-gies have been deployed into every crevice of digital life that, in the process of composing this paper using standard tools, the authors had no less than three different digital agents insert themselves into the nar-rative unsolicited 1. Humanity is standing at the threshold of a digital Industrial Revolution, unfolding at unprecedented speeds. In the phys-ical sciences, AI advances have led to Nobel prize-winning research [1]; while in the humanities, fears abound that the generative text capabil-ities of modern AI could be the death of the subject[2]. As language translators have thrown the doors wide open for cultural exchange and international cooperation, a flood of deepfakes and slop has followed, sloshing through our digital third spaces. AI quickly went from a nov-elty, to a vital resource, to (in some cases) a present existential threat [3]. + +> 1 + +Any and all of these AI “contributions” were promptly removed from the text. + +> 1 +> arXiv:2603.26524v1 [math.HO] 27 Mar 2026 2TANYA KLOWDEN AND TERENCE TAO + +1.1. Our definition of artificial intelligence. For the purposes of this article, AI refers to the broad spectrum of computer tools designed to perform increasingly complex cognitive tasks, including many that used to solely be the province of humans. AI tools are extremely di-verse, ranging from the data-driven machine learning (ML) technolo-gies of today (such as large language models (LLMs) that can process complex text, or diffusion models that can generate images and other media), to the more traditional good-old fashioned AI (GOFAI) (such as automated theorem provers or chess engines), which can solve nar-row ranges of problems by applying precise mathematical rules. 1.2. Purpose of this article. There has been no shortage of discus-sion about what these tools can or cannot do; but comparatively less discussion of why these tools are being so rapidly developed and de-ployed or how they impact the billions of lives that interact with them for research and education, for work, for play, and even for rest[4]. The authors of this paper come from academic domains that are frequently viewed as polar opposites: mathematics and the study of art. But we both have found it beneficial to incorporate several AI tools into our disparate areas of research on a day-to-day basis, and found a surpris-ing amount of common ground regarding the very messy, but universal, philosophical questions that real-world AI use poses. Using mathemat-ics as a model, we will consider the benefits, risks, ethics and outcomes of incorporating AI into routine workflows and then expand these ob-servations to broader real-world use. Despite the risks that these new, and not necessarily morally neutral technologies present, we argue two-fold that AI tools should be developed, implemented, and applied both within mathematics and in other domains: they have the potential to radically augment our natural human abilities and they are capable of expanding what is possible beyond what we humans could do indi-vidually or within the limits of our own collective capacity. Drawing from our own experiences with these tools, we particularly examine the human/AI interface and offer suggestions on the evolution of these technologies in ways that offer more benefits than harms to humanity and value the unique contributions of human thought and action in concert with the new modalities that future AI development promises. 1.3. The Faustian Bargain. The incentives of market competition have fueled a frenzied pace of development of AI technologies and fas-cinated entire industries with visions of radically accelerated workflows and cost savings. The “prisoner’s dilemma” of such competition has pressured many individuals and organizations to experimentally adopt these tools as hastily as possible, at the expense of more deliberate eval-uation of the economic, social, or moral costs and benefits of such an adoption – or, more fundamentally, why we should be developing such technologies in the first place. As such, we collectively have already HUMAN THOUGHT IN THE AGE OF AI 3 + +adopted de facto a “Faustian bargain” with these technologies, giving them increasing access to our data, cognitive workflows, and decision processes, in exchange for the promise of being able to accomplish a greater range of tasks with increasing efficiency and with less tedious effort. In theory, technology is morally neutral; it can empower both posi-tive and negative use cases. But through this empowerment, it exac-erbates existing moral dilemmas, and creates new ones. For instance, the horrific medical research on prisoners during World War II which led to lifesaving data on the limits of human endurance, raised diffi-cult questions regarding the ethicality of using such data to develop new medical advances [5] . While not as gruesome, the murky prove-nance of the data and intellectual property used to train the current generation of AI tools arguably raises similar questions today [6]. When a technology develops slowly enough, it is possible to have the necessary philosophical conversations and debates about it before it is widely deployed; stem cell research is one notable example of this. However, modern AI technologies are already widely deployed, with no practical way to “put the genie back in the bottle”; ironically, strict regulation imposed at this point would disproportionately shut down the more positive use cases of AI, such as in the acceleration of scientific research, without eliminating the more wasteful or malicious uses of the technology. Pragmatically, the discussion about AI has now moved towards how to manage coexistence with these technologies: evaluating the costs and benefits of AI (both in academic disciplines, and in to broader society), and identifying best practices and frameworks to use AI in as positive a way as possible, while simultaneously discouraging the (many) ways in which these tools can be used poorly to degrade the reliability and value of our cognitive achievements. 2. Historical parallels: is this time different? + +2.1. Past integration of automation technologies. Automation is of course not a new phenomenon. Many past technologies have also enabled the ability to automate tasks previously assigned to humans, eliminating or greatly reducing the need for some types of human jobs, while creating or increasing the need for others; in some cases. Within the scientific community for instance, “phase transitions” have occurred in which broadly and rapidly switched over to new tools (such as the in-ternet, the use of computers for scientific computation, or even humble typesetting languages such as L ATEX) due to their evident advantages. But these past technologies have mostly affected secondary aspects of the profession, such as the communication and dissemination of re-sults rather than the creation of such results. And, while the tasks automated by these tools required specialized training and expertise to perform, they typically did not require an understanding of more 4 TANYA KLOWDEN AND TERENCE TAO + +philosophical aspects of a profession, such as the nature of knowl-edge, beauty, meaning, or morality [7]. Of course, such technologies could still generate discussions on philosophical topics – for example, whether there were inherent aesthetic or creative features of an original piece of art that no mechanical reproduction could properly capture, or on the moral and ethical implications of the displacement of labor caused by the Industrial Revolution – but they were not considered to contest the fundamental philosophical assumptions underlying such discussions. For instance, the invention of the printing press revo-lutionized the communication of information and ideas, but it did not significantly alter the understanding of what an idea or a piece of infor-mation was; the original generation of this content was still performed by the deliberate actions of humans. 2.2. Modern AI. But modern AI can automate large portions of the creative process itself, enabling the mass-generation of intellectual products, such as artwork, mathematical proofs, or scientific or philo-sophical theories, with far less human oversight than was previously required 2. This has created an unprecedented decoupling between the outward form of such products, and the values and thought processes used to create these products. A diffusion model may now create an aesthetically pleasing landscape, for instance, which was not directly inspired by any particular location in the physical world, though count-less images of actual landscapes (as well as many images completely unrelated to landscapes) were certainly used to train the outputs of that model; the aesthetic response of the image thus becomes decou-pled from the original sources of such aesthetics. This is not new philosophical territory by any means. Searle’s “Chi-nese room” thought-experiment [8], regarding the question of whether a mechanical device programmed to converse in Chinese truly under-stands the language, dates back to 1980. The “AI effect” also was recog-nized around this time; for instance, the ability to perform well at chess was considered a good measure of intelligence until the advent of chess engines which could “mindlessly” outperform chess masters through mechanical exploration of game trees, at which point the “chess test” for intelligence became largely abandoned. The famed “Turing test” of whether an AI could converse in a manner indistinguishable from hu-mans has similarly been effectively passed by modern LLMs (see, e.g., [9]), relinquishing its former status as a “gold standard” for artificial intelligence. For a more recent discussion, see [10]. For now, we can still point to markers of “fundamental” understand-ing, such as the ability (or lack thereof) to coherently explain and defend the creative processes that led to a new artwork, mathematical + +> 2Current tools typically still require a human to generate an initial prompt for the AI to follow, but this process can itself now be largely automated as well. HUMAN THOUGHT IN THE AGE OF AI 5 + +proof, or other intellectual product, as a still-viable test to distinguish between human and AI-generated content, but if future generations of AI somehow also manage to convincingly pass such tests as well, would we have to move the goalposts once again on what intelligence, under-standing, and creativity actually are? Would the definitions, values, and objectives of such disciplines as mathematics and the humanities need to be re-evaluated? And what status should we grant these in-creasingly sophisticated AI tools - will they be assistants, co-authors, or even independent creators in their own right? And if so, how should we treat the content they produce, and the intellectual processes that led to such content? 3. Mathematics as a sandbox for AI use + +Such broader philosophical questions about AI are extremely com-plex and multifaceted, and we of course do not pretend to have defini-tive resolutions to any of them; and the speed of change in this space is such that any proclamations we make are at risk of being overtaken by striking new technological advances. However, we can offer some perspectives from the world of mathematics, both in the realm of pure mathematical reasoning, and in the emerging application of modern mathematical analysis in the humanities. We view mathematics as a suitable “sandbox” for exploring broad questions such as the impact of AI across the sciences (and society as a whole), as it has an older and more advanced foundation, and is by its nature well suited to explore a variety of hypothetical abstract scenarios which are counterfactual to reality. It is our hope that the lessons learned from integrating (or not integrating) AI into mathematics can give broader perspectives on how AI will interact with sciences and society in general. Frontier AI models can now solve increasingly complicated math-ematical problems, with proofs that can be independently verified, without directly reproducing the problem-solving practices of human mathematicians (such as testing out special cases, and then general-izing from those examples), though its training data would include proofs generated in such a traditional fashion; and so mathematicians will increasingly encounter situations in which the ability to prove the-orems is decoupled from the reasoning processes needed to discover and understand such proofs. This contributes to an existing trend of decentralization in modern mathematics; in a world where advanced mathematics is needed in an extremely broad range of applications, the “Bourbaki era” [11] of having a central authority prescribe the or-thodox practice of mathematics is already decades in the past 3. + +> 3Though one could argue that the ongoing projects to create large unified li-braries of formal mathematics, such as Lean’s Mathlib project, could be a modern successor to the efforts of the Bourbaki group. 6TANYA KLOWDEN AND TERENCE TAO + +At the current state of the technology, the most sophisticated AI tools still exhibit significant and often bizarre weaknesses; they can achieve remarkable and super-human performance in some tasks, while simultaneously demonstrating often hilarious levels of basic misunder-standing and error in others. Mathematics is no exception to this phe-nomenon. AI-generated mathematics can appear superficially flawless - which is to be expected, since these models are designed to produce outputs as visually close to correct human-generated proofs as possi-ble - while also making fundamental mistakes (for instance, asserting that all odd numbers are prime) that would have been trained out of a human mathematician at an early stage of their training, and can often make the resulting argument unsalvageably nonsensical. At the same time, this top-down approach of focusing primarily on generating good-looking outputs rather than on the fundamental cognitive pro-cesses that were traditionally needed to create such outputs can be surprisingly effective; the same AI that routinely makes basic mathe-matical errors, can also mysteriously arrive at the correct answer to a complex math problem with superior accuracy to human experts, or even supply a strange but technically correct proof that the answer is valid. Significant effort is now being directed to reduce or eliminate these weaknesses of AI as much as possible; often not by directly strength-ening the AI’s innate “understanding” of any given intellectual task, but rather by placing such AI tools in a rigorous environment of in-dependent testing, training, and verification to reduce the numerical incidence of errors. The ability to resolve deep mathematical conjec-tures is still currently out of reach of a completely autonomous AI, but it is very plausible in the near future that such AI tools could greatly assist human mathematicians in such endeavors, even if we would still hesitate at describing such assistance as the expression of genuinely intelligent thought. Still, the fact that such mechanistic and error-prone approaches to as intellectual a discipline as mathematics can (or soon will) generate so many of the traditional markers of quality in the subject indicates that we have to re-evaluate our models of what intelligence or creativity actually is, and how it is to be measured. 4. AI and the nature of mathematical truth + +4.1. Mathematics and standards of proof. Mathematics 4 has had a long tradition of having an objective standard of proof, starting with + +> 4Here we leave vague the concept of what mathematics actually is. One can adopt a prescriptivist view, for instance using the Davis–Hersh definition [12] of mathematics as “the study of mental objects with reproducible properties”. Or one could adopt a descriptivist view, namely that mathematics is the activity that mathematicians actually perform in practice. Our discussion here somewhat favors the latter view. HUMAN THOUGHT IN THE AGE OF AI 7 + +Euclid and refined with the advent of stable and (empirically) secure foundations of mathematics in the early twentieth century. It has been noted (see, e.g., [13]) that the near-universal acceptance of these foun-dations has given modern mathematics the rare and precious ability to arrive at a consensus on the validity of any given argument or assertion in the field, since (in principle) one could insist that such arguments be spelled out in such fine detail that each individual step could be checked to be a correct application of the standard axioms and logical inference rules of mathematics. A typical example was the claim by Nelson [14] in 2011 that the Peano Axioms were logically inconsistent; this was a claim very far from the mathematical mainstream, and yet it was possible to resolve the issue by pointing out a subtle flaw in the argument that Nelson readily accepted, thus withdrawing the claim. However, in practice, the arguments of human mathematicians fall short of the ideal of perfectly rigorous proof; minor and major mistakes in the literature are common, with some of these being corrected by formal errata or revisions, and others being neglected or passed on informally as part of the “folklore” of the subfield. Arguments which are heuristically plausible are often accepted with minimal checking, while surprising assertions that go against the conventional wisdom are met with heavy skepticism, even if the arguments do ultimately turn out to be correct on a line-by-line reading. 4.2. The Smell Test. Until now, this state of affairs has been reason-ably satisfactory; human mathematicians who follow good heuristics and intuition tend to produce convincing proofs that are largely cor-rect, with most errors fixable, whereas mathematicians who lack such intuition tend to produce proofs that contain enough superficial is-sues that one can be rightfully suspicious of the content even before one checks carefully. Informally, human-generated mathematical argu-ments tend to come with 5 a “smell” that experienced mathematicians (perhaps subconsciously) use obtain their initial impressions of how convincing the argument is, well before they have been able to check the individual steps of that argument. For instance, the blog post “Ten signs a claimed mathematical breakthrough is wrong” of Aaronson [15] lists some common examples of arguments that exhibit such a “bad smell”, that one can detect well before one has located a specific logical flaw in the proposed argument. And not all errors are equally disas-trous; some errors may even have some beneficial value, for instance by revealing a promising approach before being able to fully validate it [16]. + +> 5We use this sensory metaphor in analogy with the concept of a “code smell” in software engineering. 8TANYA KLOWDEN AND TERENCE TAO + +One component of a favorable “smell”, as noted 6 by Thurston [18], is the sense that an argument is providing understanding or insight; that it not just shows that a certain set of hypotheses logically entails a given conclusion, but provides a causal narrative for how that entailment was possible, and which parts of the argument were performing the “heavy lifting”, which parts were novel or surprising compared to previous lit-erature, and which ones were routine technical considerations. Such interpretations and impressions of the mathematical text generally are not captured in the official frameworks of rigorous mathematics, such as first-order logic or set theory; but they are essential in allowing the human mathematicians reading the argument to draw broader lessons about how one would expect the arguments to generalize to other set-tings, or interact with other methods in the literature. Such narrative structures also help strengthen confidence in the robustness of an argu-ment; a single misplaced sign in a calculation could invalidate a lengthy mathematical argument, but if the proof had a clear strategy regarding how the key difficulties in the argument were systematically isolated and addressed, following analogies with previous successful arguments in the literature, then it becomes more likely that local errors in the argument can be repaired while staying true to the spirit of the original proof. 4.3. Formalization to the rescue? There are several developments that may force the mathematical community to re-evaluate this semi-formal standard of proof. One of them is of a technical nature: as mathematics has matured and become more sophisticated (and increas-ingly computer-assisted), the arguments have become longer and more complex, with cutting-edge papers in some fields routinely exceeding a hundred pages in length, making line-by-line verification by human referees increasingly onerous. In practice, this has meant that such careful checking is not always performed, save for the most high-profile and important results, leading to an increasing (over-)reliance on the aforementioned sense of “smell” to assess the credibility of mathemati-cal arguments. It seems possible that such issues could be resolved (or at least ame-liorated) by technical means, and in particular through the more wide-spread deployment of formal proof assistants (such as Lean or Rocq) which can automatically check the validity of a mathematical argu-ment if it is written in a certain precise computer language [19]. Such formalization remains too tedious at present to deploy systematically (converting a traditional, informally written proof into such a formal + +> 6See also the article [17] by the second author, which argues that “good” math-ematics, regardless of how it is initially defined, often tends in practice to fit into broader mathematical narratives, such as the dichotomy between structure and randomness, or the ability of algebra to explore questions about geometry (or vice versa). HUMAN THOUGHT IN THE AGE OF AI 9 + +language typically takes about five to ten times longer than writing that proof in the first place), but there are significant efforts under-way to make the process faster and more user-friendly, for instance by integrating AI tools to achieve partial (or possibly even complete) “autoformalization” [20]. 4.4. Limitations of formal verification. But even if such technical issues are resolved, and mathematical proofs routinely come up with a formal verification of correctness, several new issues arise, especially in a near future in which increasingly sophisticated arguments may be partially or fully generated by AI tools. Firstly, formal verification only certifies that a formalized argument establishes a formal mathemati-cal statement, but does not rule out errors in translation between the formal statement and the original intended statement. For instance, Fermat’s last theorem asserts that for any natural number n greater than 2, there are no natural number solutions a, b, c to the equation + +an + bn = cn; but implicit in this informal description is the convention that the natural numbers start at 1 rather than 0. An AI tasked to solve this problem may erroneously assume that a, b, c are permitted to be zero, and based on this produce a (formally certified) proof that Fermat’s last theorem is false! Thus, while formalization can in prin-ciple significantly reduce the need to perform careful human review of informal mathematical text, it does not eliminate the need for such review entirely 7.Secondly, even in the purely abstract setting of advanced mathemat-ics, only a portion of a given argument can be formulated in the type of deductive logic that is amenable to formalization. While deduc-tive proof remains the crucial core of most mathematical work, there is a penumbra of heuristic, empirical, or metamathematical reason-ing around this core which provides valuable information on why the argument works, whether it extends to other contexts, what the moti-vation is for pursuing these questions, and how one might reconstruct the argument from more basic principles. Human-written proofs, by their nature, will tend to provide this penumbra organically as part of the writing process (particularly of the authors are skilled at expo-sition); but an AI that has been trained specifically on the criterion of formal correctness, at the expense of all other considerations, could produce “odorless” proofs which superficially resemble a well-written + +> 7It is even theoretically possible that mathematics itself could be “hacked” by subtly manipulating the formalization of key definitions in standard formal mathe-matical libraries such as Mathlib; see [21]. Ironically, the increasingly collaborative, social, and large-scale of mathematical research, while generally a highly positive development, may also increase potential vulnerabilities to such attacks that were not a significant concern in prior eras when mathematics was mostly performed by small groups of individuals. 10 TANYA KLOWDEN AND TERENCE TAO + +human proof, and may even pass formal verification tests, but yet re-main strangely unsatisfying - fulfilling to the letter the explicit objective to establish a given mathematical claim, while yielding far less insight than expected on the broader mathematical field that this claim is part of. In a world where all media generated is AI-polished to a high sheen, including mathematical proofs with beautiful typesetting and clear, GPT-produced explanations, is something lost in forsaking the grubbier, messier world of hand-written (or at least hand-typed) text? 4.5. Adaptation to earlier challenges. The mathematical commu-nity has adapted to previous technological challenges to its standard of proof. Large computer-assisted proofs, such as the proof of the four color theorem [22] or the Kepler conjecture [23], were initially quite con-troversial, being impractical to fully check by hand; but in time new standards of establishing confidence were established for these types of arguments, such as providing replicable code, isolating the compu-tational components of an argument in specific, clearly stated lemmas separate from the more conceptual aspects of a paper, and providing additional related data and “checksums” to check that the computer-generated calculations agree with various “sanity checks”. In effect, these developments shifted the standards of proof in mathematics in the direction of that in the natural sciences, in which both theoretical argument and empirical experiment, when properly designed, executed, and reported, are acceptable sources of scientific truth. 4.6. The evolution of AI-assisted mathematics. Similar evolu-tions will take place 8 with the advent of significant AI-assisted or AI-generated mathematics. The burden of producing verified deductive proofs may increasingly fall to computers rather than humans, with proofs increasingly being restructured 9 so that tedious calculations that would previously be carefully arranged to be human-verifiable are in-creasingly outsourced to software tools instead. For instance, infamous phrases in mathematics such as “the proof is left to the reader” or “By standard arguments, we have”, for instance, would instead be re-placed with a call to an LLM that produces both human-readable and computer-verifiable justifications for such claims. With advances in auto-formalization, it will also become significantly easier to explore how a given argument changes with respect to specific choices of foun-dations of mathematics, allowing for the metamathematics 10 of a result + +> 8Our thinking here has been influenced by the views of other mathematicians on this topic, including [24], [25], [16], [26], as well as the broader discussion in [27], [28]. +> 9For several concrete examples of such restructuring and further exploration of these developments, see [29]. +> 10 One example of such metamathematics is the reverse mathematics (see, e.g, [30]) of a theorem, which seeks to understand which axioms of mathematics (e.g., the axiom of choice, or the law of the excluded middle) are actually needed to HUMAN THOUGHT IN THE AGE OF AI 11 + +to be rigorously discussed and explored simultaneously with the math-ematical result itself. At the same time, more focus and attention may be given in the future by human mathematicians to “softer” aspects of mathematical reasoning, such as heuristics and motivation for pursing a result or selecting a proof strategy for that result, experimental evidence 11 in favor of (or against) the result, or the trial-and-error process leading to the discovery of a working argument. These aspects are not as easy to automatically verify and measure as deductive proof, and thus less amenable 12 to machine learning strategies such as reinforcement learning. It is conceivable that professional mathematicians may in-creasingly adopt 13 modes of argument from other disciplines, such as the theoretical and experimental sciences or even the humanities, to buttress their core deductive arguments with additional types of rea-soning, such as statistical analysis of experimental data, or speculative theorizing guided by both confirmed mathematical results and non-rigorous philosophical principles. Historically 14 , mathematicians have been reluctant to stray too far from their “gold standard” of rigorous deductive proof, due in part to the many visible examples of low-quality mathematics that can be produced when one no longer adheres to such standards 15 ; but in a future era where proofs can be automatically + +establish a given result. Traditionally, the reverse mathematics of a result is only explored many years after the original proof of the result, and requires specialist training in logic as well as domain expertise for the subfield of mathematics that the theorem resides in. + +> 11 + +In particular, given the increasing ability of AI to be able to “guess” the an-swer to even extremely complex mathematical questions without having anything resembling a formal proof, it will become increasingly necessary to develop standard procedures for citing and incorporating such unverified guesses into the mathemat-ical literature in a responsible fashion. + +> 12 + +Another hurdle to automating these aspects of the mathematical research pro-cess is a relative lack of data; published literature tends to focus on successful proofs of results, at the expense of detailing the (often quite rich and nuanced) processes, both formal and informal, that led to such proofs. + +> 13 + +In particular, one can envision an increasing division of labor in the future of mathematical research: while all mathematicians should stay broadly familiar with the different stages of proposing, establishing, and then interpreting mathematical results, any given mathematician may increasingly specialize in just a few aspects of this process, for instance focusing on utilizing AI assistants to prove results as directed by some more senior member of a research group, or on using the most recent literature produced by some combination of human mathematicians and AI assistants to propose new directions of inquiry. + +> 14 + +For instance, a previous proposal by Jaffe and Quinn [31] to systematically de-velop a field of “theoretical mathematics” received a largely negative reception from professional mathematicians, leading to multiple rejoinders including the aforemen-tioned article of Thurston [18]. + +> 15 + +Kim [32] invokes a currency metaphor to describe the social dynamics: profes-sional mathematicians need to accumulate some credibility “currency”, by proving 12 TANYA KLOWDEN AND TERENCE TAO + +generated and verified in a highly trusted fashion, there may be more opportunity to safely explore such broader modes of mathematical rea-soning and discussion. These new technologies could also impact the longer-term goals of mathematics in significant negative ways. At the educational level, we are already seeing many students who resort almost immediately to modern AI tools to perform their assigned coursework, achieving the immediate goal of producing verifiable answers to a given problem at the expense of developing more sustainable mathematical skills and intuition; similarly at the research level, the “fourth paradigm” of data-driven mathematics [33] could conceivably be so successful as to crowd out the more traditional paradigms of empirical evidence, theoretical reasoning, and computational numerics (the second of which being the currently dominant paradigm for pure mathematics), as well as the great value that human mathematicians 16 gain from visual, kinesthetic, and other sensory intuition, or from intuition grounded by our famil-iarity with the laws of physics, economics, biology, etc.. Even assuming a completely trusted implementation of formal methods, an uncritical embrace of AI assistance in the mathematical research space could lead to the undesirable outcome of a flood 17 of largely AI-generated papers containing results that are technically correct and new, but which do not contribute to broader mathematical narratives, and do not build up intuition for either the authors or the readers. The negative impres-sions produced by such low-quality work may lead to a stigma against even the most careful and responsible application of AI assistance in + +difficult new mathematical results, before they can “afford” to “spend” that currency on speculative activities, such as formulating conjectures or philosophizing about the broader consequences of a result. + +> 16 + +Somewhat related to this, aesthetic notions such as the “beauty” or “elegance” of a mathematical argument may become even more decoupled than they currently are from the formal correctness of such arguments. Consider for instance the proofs generated by AlphaProof [34] to problems in the 2024 International Mathematical Olympiad, which contained numerous redundant or inexplicable steps but never-theless were formally verified to be correct solutions. See also the discussion in [25]. + +> 17 + +This could be viewed as an illustration of the law of unintended consequences. In past mathematical eras where the task of obtaining a rigorous mathematical proof required painstaking human effort, mathematical activity naturally focused on problems which were deemed by the mathematical community to be of interest, even if the philosophical question of what it truly meant for a given result to be “interesting” or “relevant” was often not explicitly considered by most members of that community; the evolution of the literature was slow enough that this largely social mechanism of determining mathematical significance could correct itself over time. In a future era where mathematical results can be mass-produced at signif-icantly faster speeds due to automation, such philosophical issues may require far more active attention. See also [35], [36] on the need to make value judgments, in-cluding trustworthiness of the author, when deciding whether to allocate attention a claimed mathematical result. HUMAN THOUGHT IN THE AGE OF AI 13 + +mathematics, which could in turn inhibit the potentially positive ben-efits of such technologies, such as the ability to explore mathematics in broader and more holistic ways as mentioned above. 4.7. Applying philosophical questions to real-world AI use. + +Any content that is a foundational reference for other research car-ries additional responsibilities, and mathematics is no exception. We can formally certify the validity of any AI-generated mathematical ar-gument; but validity is only one component of value, and there are nuanced value judgments that are necessary in presenting AI-driven research in real-world situations. Which elements of the potentially large body of trivial and non-trivial findings the researcher find partic-ularly interesting and noteworthy to share within and beyond the field of research and how that material is presented to a wider audience has not been standardized among human researchers. There are also uncer-tainties in how precedence and credit are assigned. AI-assisted research also presents new ethical and legal ramifications and as-yet unanswered questions on the intellectual property rights of AI-generated content (including proofs). What principles should guide researchers in deciding on the suitabil-ity and best application of one AI model or another, or if AI is a good choice at all? In academic domains, it is not unreasonable to make the assumption that most who pursue the path of academic research do so out of a desire to make the world a better place and to make meaningful contributions to it. Mathematicians will want to prioritize those use cases that are the most beneficial to mathematics. Researchers in all fields will frequently want to prioritize not only those uses that benefit their own field but which have cross-disciplinary benefits as well. And it can be given that most who use AI for research purposes at all will want to prioritize uses which benefit humanity over those which harm it. It is consequently important that within the field of AI develop-ment, it needs to be highlighted who is benefiting from these tools and what benefits are occurring to help people identify how to responsibly optimize the outcomes as much as possible. 4.8. Intellectual property and responsibility. The issue of intel-lectual property and responsibility (or perhaps, accountability) alone is a minefield and needs careful discussion. When AI is applied to a prob-lem, who is responsible for errors? Who gets credit for insights? These may not be the same party and may not be parties that are clearly defined. So far, much of the accumulation of training data for the large language models (LLMs) has been argued (by their developers) as falling under the “Fair Use” doctrine. Within the United States, the application of “fair use” has some flexibility depending upon (among other things), the purpose of the IP use [37]. As a thought experi-ment, we can consider whether greater benefit merits greater use [38]. 14 TANYA KLOWDEN AND TERENCE TAO + +Would it be reasonable to claim it is fair use to draw upon all recorded knowledge in a situation where it is intended to save the world from impending doom? Would such a broad application still apply if it was saving the world from a more distant existential threat (e.g., climate change)? What about if it was “only” ending all disease? Or simply eradicating cancer? As all of these are posited beneficial applications of AI, is it then reasonable to grant AI use of all recorded information to make such marvels possible? Beyond the problematic argument for an extremely broad interpre-tation of “fair use”, clear standards and protocols for the assigning of credit and for citation are desperately needed. AI-use cases will draw not only upon the researcher’s data but also the information the AI was previously trained on, the choices of which information the AI was trained on (made by software engineers and designers who may have no interaction with the primary researcher) and, of course, the AI contributions themselves. Is the traditional academic citation system adequate for assigning proper credit in a situation with potentially hundreds or thousands of “hidden” contributors, or is it adequate to simply cite the AI model itself? The undisclosed use of AI to perform a significant portion of the writing in research papers has provoked par-ticularly strong reactions, with many academics viewing such practice as comparable to plagiarism; ironically, this has led some researchers who derive benefit from their tools to conceal their usage from view even further. It is clear that new professional standards and practices regarding AI disclosure and use will need to be developed 18 .AI is also on the verge of creating potentially widespread circular citation loops, a process humorously dubbed “citogenesis” by Randall Munroe 19 in 2001. For instance, following the recent success of AI “deep research” tools [40] in uncovering solutions to open problems that had been buried in obscure literature, the second author helped launch an effort on an mathematical open problem site [41] to systematically use these tools to report the known literature on these problems, or the ab-sence of such. While this added real value to the site, we also found that the deep research tools used these reports as an authoritative source for their search, with the unintended consequence that summarizing these searches on the site interfered with any subsequent use of these tools to turn up genuinely new literature on such problems! Thus, even in the absence of malicious intent, the increasing power of these tools necessitates a more thorough vetting of the provenance of cited information. + +> 18 For an initial discussion of this topic by one of the authors, see [39]. +> 19 https://xkcd.com/978 HUMAN THOUGHT IN THE AGE OF AI 15 + +5. The costs and benefits of AI + +5.1. Economic and societal impacts: who benefits? Given the already significant impact of AI on individuals, as well as its rapid pace of development, it is all too easy to see a pathway in which AI scales up to present a species-wide existential threat. With any steps forward, developers and other influential individuals need to carefully consider who is benefiting from these advance and who is being harmed by them. We would propose that any further development should pri-oritize benefit for humanity as a whole and that AI applications should remain directly useful to humans (individually or collectively). For each individual use-case, assessment should be done to articulate who the intended beneficiaries are. Will this specific AI model or im-plementation of a model benefit society as a whole or will it only deliver tangible benefits (such as cost savings) to a small group of individuals? AI tools are of such power and complexity that extreme economic gain for a small number of individuals at the cost of millions comes at an intolerable and unacceptable moral cost. We must facilitate implemen-tations of AI which preserve and value the humanity of humans above their commodification. We need not look far for disastrous outcomes of prioritizing cap-ital over human well-being. Often characterized as arbitrarily anti-technology and anti-progress, the Nottingham textile workers of the early nineteenth century who called themselves “Luddites” objected violently to automation displacing them from their jobs and replac-ing them with lower-skilled and lower-wage workers. The immediate threat to their jobs presented an existential threat to their livelihoods in a harsh economic climate characterized by high unemployment and rampant inflation. While we look back at the automation of the In-dustrial Revolution as general beneficial to society, these benefits came with real, measurable human costs. Today, unlike in the Luddites’ time, we are already seeing skilled workers replaced not with lower-wage human labor, but with AI. En-try level jobs have historically been the path to financial and social prosperity for a burgeoning generation of workers. When they simply vanish, opportunity vanishes with them. Despair and resentment build to anger and outrage as humans place themselves in direct opposition to the tools that held promise to improve their quality of life. As all emerging technologies have some benefits for humanity as a whole, they also come with a real human cost. For a radically disruptive technology like AI, the human costs must be quantified at a local level and a global level and carefully weighed against the benefits. The metrics we use for this assessment are still fuzzy and ill-defined. Do we continue as we have, to look at the monetary gains and losses? Should we be considering the increased access to resources balanced against the 16 TANYA KLOWDEN AND TERENCE TAO + +resources lost? Do we consider the more intangible benefits of quality of life and happiness, and if so, how do we compare these intangibles against more quantitative gains? The current business climate unfortunately seeks a Wunderwaffe which is being optimized for power and the broadest possible impacts in the hopes that it will be able to outrun any potential problems. But a failure to quantify the human cost of our emergent technologies does a great disservice to humanity as a whole for the benefit of a select few. Further, the current climate where AI is being implemented simulta-neously in virtually every sphere of society, without consideration for whether it provides the end users any meaningful benefit, only serves to alienate and frustrate people in all walks of life. We are already see-ing the natural reaction to having a technology imposed on individuals without consent–feeling a loss of control, their first instinct is to reject all AI technologies, even at the risk of throwing the “baby” (AI uses that offer quantifiable benefits in their lives) out with the “bathwater”. If we can instead keep our technology focused on first and foremost, quantifiably improving most or all human lives, we are much less likely to destroy ourselves than if the sole focus of these technologies is the commodification of mechanical labor, digital labor, and human labor. 5.2. Tallying the costs of AI. Alongside the direct human costs, no ethical implementation of AI can occur without looking at other more opaque and hidden costs. The most substantial and immediately apparent cost of developing and building an effective AI infrastruc-ture comes from the reality that these technologies, unlike those of the computing revolution of the 1970s, cannot be developed as a hobby or cottage industry–there is no garage of computer parts that a single innovative thinker like Steve Jobs can use to build an empire. The AI models that have been built require a massive investment in hardware, servers, talent, and pre-training long before you can get to a working AI, let alone a profitable one. A better comparison for the scale that AI requires for development is the transcontinental railroad network built in the US in the latter half of the nineteenth century. The companies that built the railroads had to develop and build a fleet of massive engines, and plan and lay thousands of miles of rail before the first train could quickly and reliably transport goods from Iowa to San Francisco, unlocking the economic returns these companies had gambled on. The huge upfront outlay for AI-based technologies has led devel-opers to chase a profit-driven capitalist model, creating a new class of technological elites who wrangle enormous sums of invested capi-tal and managed debt while strategically maneuvering to capture and hold finite resources (in land, energy, water, skilled labor, and such) just as the robber barons of the nineteenth century Gilded Age did. As HUMAN THOUGHT IN THE AGE OF AI 17 + +with that age, the scale of these investments has resulted in massive inequities in economic stability, in access to these technologies, and in general quality of life across the developed world. Our society has already begun to recognize the significant environ-mental costs that large-scale AI demands. Heavy energy and water consumption create significant daily challenges for those living in the shadow of the expansive facilities these AI models require. It has been credibly suggested (see, e.g., [42]) that AI-generated solutions can be applied to mitigate or eliminate the heavy climate costs of two centuries of human technology use. And perhaps the marginal costs of operating these tools will decline over time as the infrastructure is built out, and more efficient uses of computation are developed. However, to date, none of the large AI models in operation have provided a solution to even offset their own resource consumption and waste emissions. Additionally, it is noteworthy that modern AI tools do not pursue or intuit “truth” through manifestation in the physical world, or compre-hension of the immutable nature of our reality’s physical laws; instead, these models rely heavily on human-generated data, often without at-tribution, as well as significant amounts of human feedback to itera-tively improve itself. Models cannot be built to be less reliant on human intellectual labor without a serious risk of contaminating our collective body of information with AI-generated information. There is a clear limit to how much AI can be used to generate “new information” in a domain before AI collapse [43] becomes a serious problem. Without a sufficient amount of genuine content, AI becomes ungrounded from reality, caught up in a mode of thought that has no connection to the real world and significantly hampers the meaningful interactions at the human/AI interface. Mathematics, with its formal verification process, may have a tolerance for higher levels of AI contamination than other domains; but as we have seen, it is not completely immune to this danger. 5.3. The Digital Divide. A further significant social cost to consider is the potential for AI technologies to exacerbate existing inequalities or to create new ones. In principle, all humans have the ability to utilize their natural intellectual talents (assuming adequate education and a supportive environment, of course); but the trends in the applica-tion of frontier AI models already demonstrate that the large scale AI tools may only be available to well-financed or well-connected research groups, or to individuals who are the most willing to hand over their personal data and look past any ethical concerns regarding the use of such models. This creates a fundamental “digital divide” between the AI-have and the AI-have-nots. It is paramount to prioritize equitable access when AI has the ca-pacity to radically improve research performance, however within the 18 TANYA KLOWDEN AND TERENCE TAO + +current AI landscape, a second, more nuanced digital divide appears. When the dominant AI models are capitalized, privatized, and com-peting for finite resources (investment and dependent user base), they are (perhaps unintentionally) incentivized to develop “spiky” capabili-ties to retain a competitive advantage over each other, rather than to provide consistent and even performance in different domains. As in-dividuals are locked into one model over the others due to institutional negotiations and market restraints, we must consider the risk that one model will give a meaningful advantage over another in a particular research sphere, creating divisions even within the subgroup that has reliable and easy access to AI resources. On the other hand, many of the benefits of AI models in scientific and humanities-based research do not necessarily require the most ad-vanced models. Smaller “local models”, as well as non-LLM technologies such as proof assistants, demonstrate the capacity to return meaningful results faster and more efficiently than models that necessitate massive data centers processing the sum of all human knowledge. There is sig-nificant potential to be able to distill smaller models from the existing larger ones to take advantage of the most advanced AI capabilities with small, user-defined training libraries carefully targeted to the specific area of research interest. Perhaps a diverse array of smaller, more tar-geted models, maintained by a community of users, could emerge as a viable alternative the current extremely large and expensive models available today. Increased support for such community projects and could help to alleviate the problem of inequitable access. While many of these smaller projects would be feasibly developed and run through smaller-scale public and private institutions, indus-try practitioners and policymakers have called for regulatory actions to create and preserve equitable access to AI technologies [44]. As part of that effort there would be significant advantages to investing in the development national or multi-national public-facing coalition for ad-vanced AI research and the development of a large, publicly funded and publicly accessible AI resource (or models) [45] to readily bring AI access to those individuals and groups who would otherwise be left be-hind by the private, corporatized models that currently dominate the field. 5.4. Harm Reduction. In the early days of aviation, plane travel was an incredibly unsafe technology, with countless horrific accidents. Today, it is the safest and most reliable mode of transportation over long distances. Just as AI has the potential to lead to catastrophic outcomes in the near term, for it to follow a similar trajectory (hope-fully with fewer fatal incidents) will require decisive actions to reduce harm. Best practices must be defined [46] and training and regulation HUMAN THOUGHT IN THE AGE OF AI 19 + +designed to enhance the most responsible uses of the technology, while discouraging or banning concealed or harmful ones. This is a fine needle to thread. On the one hand, an individual who is using AI assistance cautiously and responsibly might be overtaken in the short term by less scrupulous rivals who are using faster, but more unreliable, AI practices to accelerate their work. At the same time, such individuals may be derided, condemned, and excluded by cohorts of AI-distrusting peers for even daring to entertain the possibility of incorporating the technology into the workflows of their profession. The current, largely laissez faire approach to allowing AI technology to develop at an unchecked pace does not seem promising for such a nuanced, responsible approach to adoption to prevail. There are some precedents to draw upon for guidance. The rapid de-velopment of Wikipedia in the early twenty-first century initially caused some disruption to educational systems, as many students started blindly incorporating text from that online resource verbatim into their as-signments, and many instructors reacted by banning the use of that encyclopedic resource. Critiques of Wikipedia’s unreliability and po-tential bias were commonplace. However, as the site matured, and academia gained familiarity with its strengths and weaknesses, a rough consensus emerged on how to incorporate this resource into education and research. It became encouraged, or at least condoned, for students and researchers alike to use Wikipedia as a starting point for inquiry on a given topic; and, instead of using its text directly, students are urged to follow up the secondary sources provided by the site, or check them against independent sources of information. Today, Wikipedia is widely accepted as a useful resource in academia. Could we reach a similar level of responsible acceptance with AI? We are cautiously optimistic that this is possible; but it will require sus-tained effort and clear philosophical guidance. For instance, we believe it to be a moral and ethical imperative that AI tools should be devel-oped to benefit all (or at least most) humans, rather than a privileged few; that it must create solutions to actual human needs and enhance the quality of life and experience for as many humans as possible; and that the real or potential harms of these tools are recognized, assessed against their benefits, and mitigated whenever possible. It does not require excessive cynicism to recognize that many of these objectives will not be attained in practice; but debating the system of values we wish these tools to align with is the first step to making it possible to actually achieve these goals. As some consensus is (hopefully) found on these values, then in co-ordination with the actions above to mitigate the worst impacts of AI, attention must be turned to the greatest source of friction–the inter-face between AI and humans. To move beyond an uneasy and unstable truce, we need to develop methods to enable individuals to incorporate 20 TANYA KLOWDEN AND TERENCE TAO + +AI tools into their daily life in ways that feels satisfying and energizing instead of draconian and oppressive. As AI continues to develop and evolve, so to will humanity’s uses, interactions, and ultimately relation-ship with AI need to evolve, from convenient tool to assisting partner to ready collaborator. 6. The human/AI interface + +6.1. A short term view: AI as the “vanilla extract” of intellec-tual production. How should we conceptualize the interface between humanity and AI tools? In the immediate moment, it is still defensible to view these technologies primarily as curiosities and many users are uncertain as to how to reasonably apply them. Our suggested guidance for navigating this current transition is to make a culinary analogy: vanilla extract, a common ingredient in most sweet recipes famous for its nearly universally appealing scent. Ingested by itself, vanilla extract is usually considered extremely unpleasant, but its addition in small amounts is widely regarded as improving and enhancing the other flavors of the dish, even when it cannot be dif-ferentiated from them. While it is easy to conclude that more vanilla extract is better, most people who have used it understand there is some upper limit beyond which it ruins the dish entirely 20 . Most of us do not have a clear sense of what that upper limit actually is, so find it wisest to keep it as a very minor addition. Similarly, one could view current AI usage as an optional addition to cognitive workflows: it is interesting to experiment with in moderation– a pass of a human-composed text through an AI language model for suggestions on grammar and phrasing, or a list of bullet points handed to AI to organize into a suggested structure. These light touches, like a small splash of vanilla, will enhance and enrich the character of the work without overwhelming it. AI-content that is utilized to as the core components of a such workflows, however, will not yield desirable, effec-tive, or valuable outcomes. With such a philosophy (and appropriate citation of AI use), there is no immediate need to rethink fundamental assumptions about the role of humans in intellectual pursuits such as mathematics, the sciences, or creative arts. 6.2. The medium-term: AI on the “red-team”. However, as these tools increase in capability and become more broadly adopted, the abil-ity to “opt out” of such technologies will diminish. Even if one person-ally chooses to actively avoid using AI assistance, colleagues, students, and professional institutions that individual interacts with will increas-ingly incorporate AI into their own work. Presently, there are serious concerns that entire areas of academic discourse could be drowned out + +> 20 A notorious thought experiment on Tumblr [47] concluded that a cake that was 44% vanilla extract would be inedible. HUMAN THOUGHT IN THE AGE OF AI 21 + +by a flood of low-quality AI-generated content. In the near term, this can be combated with strict editorial policies to prohibit most forms of AI-generated content; but as these tools become more pervasive and a network of individualized AI agents become more commonplace, a more nuanced approach will become necessary. In the medium term at least, it will still be possible, and necessary, to devise rules and guidelines to identify the more responsible usages of AI and discourage irresponsible use, without fundamentally changing the humanistic nature of one’s field - in short, viewing AI assistance as a tool or junior partner, rather than as a replacement, for human-centered work. In this case, it can be useful to make a distinction between 21 the “blue team” tasks of generating new content and struc-tures, and the “red team” task of verifying, testing, or maintaining that content. AI is relatively safe to utilize in a “red team” capacity of reviewing human-generated content for errors or suggested improve-ments; but with the stochastic unreliability and lack of groundedness of the current and near-term tools, it is unsafe to trust them in any “blue team” structural capacity that is beyond the ability of the “red team” (which could consist of humans or more automated verification tools, such as formal proof assistants) to verify. In this philosophy, the emphasis is on managing the potential risks of AI use while still cap-turing many of its potential benefits, rather than radically rethinking the fundamental nature of the field. 6.3. The longer term: is a philosophical retreat inevitable? + +But suppose one looks ahead to a more distant future in which the current weaknesses of AI tools are satisfactorily resolved, and their ca-pabilities now match or exceed that 22 of expert humans in all practical dimensions, rendering the risk-management philosophy obsolete. How will we then respond to the complex philosophical questions raised by the transformative nature of such advanced technologies? One option is simply to retreat into purely technical frameworks in which these questions are no longer operative. In mathematics, we have the “formalist” viewpoint, where the only objective is to manipulate mathematical symbols according to precise rules. In the sciences, the pragmatic “shut up and calculate” philosophical position plays a similar role; and in the creative arts, one can work as an artisan rather than as an artist, creating works that satisfy the parameters provided by an external client, without passing any judgment on the value of the product. In each of these cases, no distinction need be made between + +> 21 This terminology is inspired by the distinction in cybersecurity between a “blue team” that defends a system from attackers, and a “red team” that probes for weaknesses. +> 22 This scenario is sometimes referred to as “Artificial General Intelligence”, al-though there is no consensus on the precise definition of this term. 22 TANYA KLOWDEN AND TERENCE TAO + +human-generated work or AI-generated work, so long as the technical specifications of the task are met. But while technique is certainly an essential component of each of these disciplines, it does not capture the full experience of how math-ematics, science, and the arts are conducted in practice, and provides little guidance on such practical questions as how to motivate the next generation of students, or what directions of curiosity-driven research to pursue. So, one could instead retreat to a radically different position, in which one ascribes an ineffable special status to human intellect or human creativity, permanently distinguishing any activity exercising these human traits such talents at a fundamental level from any artifi-cial replication of that activity, regardless of how accurately the latter could replicate or surpass the former at a technical level. In this frame-work, Artificial Intelligence will forever be “No True Scotsman”: lacking true “soul” or “understanding”. With the long familiarity with our own species, we are used to humans being unreliable, “spiky” in their abili-ties, and sometimes lucking into successfully achieving a task through random word association and rote memorization; but when AI tools exhibit similar behaviors, one can be inclined to judge them far more harshly, for instance attributing such failings to their inherent nature as “stochastic parrots”. But perhaps this position is simply denying an uncomfortable truth: that some portion of our vaunted human capa-bilities are in fact not that much more sophisticated in nature than the AI algorithms we have now designed to mimic them. And as AI perfor-mance continues to advance, such a human-chauvinistic viewpoint risks degenerating into an increasingly untenable “god of the gaps” philoso-phy, in which an ever-shrinking list of qualities are touted as indicators of essential human achievement that AI is still not yet able to replicate. A third option, particularly favored by some enthusiasts of these technologies, is to hold that all human cognitive abilities will soon be completely superseded by their AI equivalents, rendering philosophical discussions about the value of human contributions and concerns to mathematics, science and the arts increasingly moot. In the more extreme versions of this position, the very exercise of human intellect is viewed as an undesirable and tedious activity, which ought to be replaced by automation as quickly as possible, in order to free up time and mental space for more leisurely or hedonistic pursuits. Obviously, an implementation of this philosophy would carry many risks, such as the degradation of human abilities to the point where our species will become collectively unable to monitor, control, or even understand the actions of that increasingly powerful AIs that we will have delegated our civilization to 23 . + +> 23 For a vision of what this framework would look like in practice, we suggest the science-fiction film Wall-E [48]. HUMAN THOUGHT IN THE AGE OF AI 23 + +There however appears to be some philosophical middle ground be-tween these “straw-man” extremes, which can provide useful perspec-tive for emerging paradigms of cooperation and complementary coex-istence between humans and AI agents. One precedent for this can be seen in the world of chess, which was once seen as a quintessential exer-cise of pure human intellect. It has now been several decades since any human grandmaster has been able to best a chess engine. Neverthe-less, chess remains a popular and thriving human activity, with chess players incorporating engines into their training, using them to revisit old chess theories and explore new ones, probe for exploits and weak-nesses in otherwise invincible AI chess players, or creatively introduce new forms of competition that involve varying levels of AI assistance. The philosophical questions of what the game of chess actually is, and what the value is in playing it, continue to be worth asking; and the currently accepted answers do not closely resemble any of the three extreme positions outlined above. 6.4. A Copernican view. One possibility is to embrace a cognitive analogue of the Copernican revolution in astronomy. In antiquity, the dominant models of cosmology (insofar as the universe was viewed in mechanistic terms) were geocentric in nature, in which the Earth had a privileged ontological status as the immobile center of the universe, fun-damentally distinct in nature from the heavens above or the underworld beneath. However, multiple advances in astronomy and physics dis-mantled this view, successively demonstrating over the centuries that the Earth was in fact in motion around its axis, and in orbit around the Sun, with the Sun itself orbiting the center of our galaxy, which in turn was part of an expanding universe that lacks any notion of a spatial center. Indeed, it became extremely fruitful to adopt a com-pletely opposing philosophical viewpoint, now known as the Copernican principle: that the Earth was just one planet among countless others in the universe, receiving no preferential treatment whatsoever from the fundamental laws of nature. At first glance, this view feels quite threatening to humanity’s emo-tional attachment to our home planet, but ultimately there is no fun-damental contradiction between the universe’s disinterest in the planet Earth, and our own strong investment in it; we can still quite justifia-bility prioritize issues specific to planet Earth over those on other plan-ets, while simultaneously accepting that these other planets exist and would be of comparable importance to their own inhabitants. Similar revolutions can be seen in the historical development of other sciences, for instance in the Darwinian revolution dethroning the unique status of humans among other constantly evolving species, or the dethroning of the privileged role of Euclidean geometry as a source of synthetic a priori truth in mathematics. 24 TANYA KLOWDEN AND TERENCE TAO + +Until recently, our species has similarly embraced an intellectual ana-logue of the geocentric model, in which human intelligence stood at the center of the cognitive universe, thus affording it a special philosoph-ical status. But now we are discovering (or creating) other “planets” of intelligence comparable in many ways to our own, while simulta-neously being quite distinct in many aspects. Instead of denying the existence or importance of these planets, or arguing over which of these planets deserves to be the “center”, one can instead accept that both human and artificial intelligences exist in the same ontological category, though with many distinctive differences and complementarities. While our interests and attachments will still largely be tied to the human intellectual sphere, its relationship with other forms of intelligence can be explored, both for practical purposes of more efficiently achieving various real-world objectives, as well as for more philosophical reasons, such as achieving an external perspective on human cognition that was previously difficult to attain. 7. Conclusion + +The unstructured, chaotic, and widespread release of AI technology into the world has already dramatically shifted social, intellectual, and economic spheres in ways that are as alarming as they are beneficial. While unquestionably, some kind of collective effort by humanity is needed, whether through regulation, market pressure, or by some as-yet defined force; we have decidedly not yet reached a tipping point from which we cannot extricate ourselves from the high economic and social cost of these new technologies. Approaches to integrating AI into the field of mathematics have just as rapidly demonstrated the promising benefits that AI can bring to academic research, scientific progress, and to humanity at large. The largely objective and verifiable nature of mathematical research presents a unique opportunity to experiment with these new technologies and study the resulting impacts in ways that do not present an ethical or existential risk to the individual or broader society. From the application of AI to mathematics, we are able to explore the pressing philosophical and moral questions of broader global AI use. Further, we can extrapolate potential pathways to relieve the tensions at the AI/human interface and suggest new paradigms of cooperative AI/human thought that respect the unique and valuable qualities that each modality brings to the metaphorical table. Though we will never get the genie back in the bottle, we are optimistic that, as our understandings and action rapidly advance, we can yet clear the smoke away and look toward a bright, if somewhat uncertain, future. 7.1. Acknowledgments. We thank Silvia de Toffoli for helpful com-ments and references. HUMAN THOUGHT IN THE AGE OF AI 25 + +References + +[1] J. Jumper, R. Evans, A. Pritzel, et al., “Highly accurate protein structure prediction with AlphaFold,” Nature, vol. 596, pp. 583–589, Aug. 2021. [2] S. Marche, “The College Essay Is Dead,” Dec. 2022. [3] D. Anguiano and L. Beckett, “How Hollywood writers triumphed over AI – and why it matters,” The Guardian, Oct. 2023. [4] E. Oh, W. Kearns, M. Laine, G. Demiris, and H. J. Thompson, “Perceptions of and Experiences with Consumer Sleep Technologies That Use Artificial In-telligence,” Sensors, vol. 22, p. 3621, Jan. 2022. [5] F. Swain, “Is it right to use Nazi research if it can save lives?.” https://www.bbc.com/future/article/20190723-the-ethics-of-using-nazi-science. [6] A. Tarkowski, “Open source and the democritization of AI,” in Artificial Intelligence and the Challenge for Global Governance: Nine Essays on Achieving Responsible AI (A. Krasodomski, ed.), pp. 30–36, Royal Institute of International Affairs, June 2024. [7] J. Chun and K. Elkins, “The Crisis of Artificial Intelligence: A New Digi-tal Humanities Curriculum for Human-Centred AI,” International Journal of Humanities and Arts Computing, vol. 17, pp. 147–167, Oct. 2023. [8] J. R. Searle, “Minds, brains, and programs,” Behavioral and Brain Sciences, vol. 3, pp. 417–424, Sept. 1980. [9] Q. Mei, Y. Xie, W. Yuan, and M. O. Jackson, “A Turing test of whether AI chatbots are behaviorally similar to humans,” Proceedings of the National Academy of Sciences, vol. 121, p. e2313925121, Feb. 2024. [10] H. Chen, S. R. Grimm, O. Russakovsky, and T. Lombrozo, “Machine under-standing.” Unpublished preprint. [11] M. Mashaal, Bourbaki: A Secret Society of Mathematicians. Providence, RI: American Mathematical Society, 2006. [12] The mathematical experience. Boston: Birkhäuser, 1981. [13] R. Wagner, “Mathematical consensus: A research program,” Axiomathes, vol. 32, pp. 1185–1204, Dec. 2022. [14] J. Baez, “The Inconsistency of Arithmetic.” https://golem.ph.utexas.edu/category/2011/09/the_inconsistency_of_arithmeti.html. [15] S. Aaronson, “Ten Signs a Claimed Mathematical Breakthrough is Wrong,” Jan. 2008. [16] S. DeDeo, “AlephZero and mathematical experience,” Bulletin of the American Mathematical Society, vol. 61, pp. 375–386, July 2024. [17] T. Tao, “What is good mathematics?,” Bulletin of the American Mathematical Society, vol. 44, no. 4, pp. 623–634, 2007. [18] W. P. Thurston, “On Proof and Progress in Mathematics,” in 18 Unconventional Essays on the Nature of Mathematics (R. Hersh, ed.), pp. 37– 55, New York, NY: Springer, 2006. [19] S. de Toffoli and F. Tanswell, “The technological turn in mathematics,” Blackwell Companion to the Philosophy of Mathematics, 2025. [20] Y. Wu, A. Q. Jiang, W. Li, M. Rabe, C. Staats, M. Jamnik, and C. Szegedy, “Autoformalization with Large Language Models,” Advances in Neural Information Processing Systems, vol. 35, pp. 32353–32368, Dec. 2022. [21] F. Tanswell, “Can Mathematics Be Hacked? Infrastructure, Artificial Intelli-gence, and the...,” June 2025. [22] K. I. Appel and W. Haken, Every Planar Map Is Four Colorable, vol. 98. American Mathematical Soc., 1989. 26 TANYA KLOWDEN AND TERENCE TAO + +[23] T. C. Hales, “A Proof of the Kepler Conjecture,” Annals of Mathematics, vol. 162, no. 3, pp. 1065–1185, 2005. [24] A. Venkatesh, “Some thoughts on automation and mathematical research,” Bulletin of the American Mathematical Society, vol. 61, pp. 203–210, Feb. 2024. [25] S. DeDeo, “Hard Proofs and Good Reasons,” Oct. 2024. [26] J. Avigad, “Is mathematics obsolete?,” 2025. [27] “Special issue on AI and mathematics, Part I,” Bulletin of the American Mathematical Society, vol. 61, pp. 199–372, Apr. 2024. [28] “Special issue on AI and mathematics, Part II,” Bulletin of the American Mathematical Society, vol. 61, pp. 373–524, July 2024. [29] H. Macbeth, “Algorithm and abstraction in formal mathematics,” May 2024. [30] J. Stillwell, Reverse Mathematics: Proofs from the inside Out. Princeton, New Jersey: Princeton University Press, 2018. [31] A. Jaffe and F. Quinn, “ “Theoretical mathematics”: Toward a cultural syn-thesis of mathematics and theoretical physics,” Bulletin of the American Mathematical Society, vol. 29, no. 1, pp. 1–13, 1993. [32] M. Kim, “Thinking and explaining.” MathOverflow. https://mathoverflow.net/q/38694 (version: 2024-01-05). [33] T. Hey, “The Fourth Paradigm – Data-Intensive Scientific Discovery,” in E-Science and Information Management (S. Kurbanoğlu, U. Al, P. L. Erdoğan, Y. Tonta, and N. Uçak, eds.), vol. 317, pp. 1–1, Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. [34] “AI achieves silver-medal standard solving International Mathematical Olympiad problems.” https://deepmind.google/blog/ai-solves-imo-problems-at-silver-medal-level/, May 2024. [35] C. J. Rittberg, “Justified epistemic exclusions in mathematics,” Philosophia Mathematica, vol. 31, pp. 330–359, 04 2023. [36] S. De Toffoli and F. S. Tanswell, “Trust in mathematics,” Philosophia Mathematica, pp. 1–25, 2025. Published online ahead of print. [37] “Copyright and Fair Use | Office of the General Counsel.” https://ogc.harvard.edu/pages/copyright-and-fair-use. [38] A. Weir, “Chapter 11,” in Project Hail Mary, Penguin Books (Series), pp. 191– 194, London: Penguin Books, 2022. [39] “Best practices for incorporating AI etc. in papers.” https://ai-math.zulipchat.com/. [40] S. Bubeck, C. Coester, R. Eldan, T. Gowers, Y. T. Lee, A. Lupsasca, M. Sawh-ney, R. Scherrer, M. Sellke, B. K. Spears, D. Unutmaz, K. Weil, S. Yin, and N. Zhivotovskiy, “Early science acceleration experiments with GPT-5,” Nov. 2025. [41] T. F. Bloom, “Erdős Problems.” https://www.erdosproblems.com/. [42] J. Cowls, A. Tsamados, M. Taddeo, and L. Floridi, “The AI gambit: Leveraging artificial intelligence to combat climate change—opportunities, challenges, and recommendations,” AI & SOCIETY, vol. 38, pp. 283–307, Feb. 2023. [43] I. Shumailov, Z. Shumaylov, Y. Zhao, N. Papernot, R. Anderson, and Y. Gal, “AI models collapse when trained on recursively generated data,” Nature, vol. 631, pp. 755–759, July 2024. [44] “Supercharging Research: Harnessing Artificial Intelligence to Meet Global Challenges | PCAST,” tech. rep., President’s Council of Advisors on Science and Technology, June 2024. [45] E. Jones, “A ’CERN for AI’ - what might an international AI research or-ganization address?,” in Artificial Intelligence and the Challenge for Global HUMAN THOUGHT IN THE AGE OF AI 27 + +Governance: Nine Essays on Achieving Responsible AI (A. Krasodomski, ed.), pp. 10–17, Chatham House, the Royal Institute of International Affairs, June 2024. [46] M. Mantegna, “An ethics framework for the AI-generated future,” in Artificial Intelligence and the Challenge for Global Governance: Nine Essays on Achieving Responsible AI (A. Krasodomski, ed.), pp. 47–57, Royal Institute of International Affairs, June 2024. [47] “Vanilla Extract.” https://knowyourmeme.com/memes/vanilla-extract, Feb. 2023. [48] “WALL-E,” 2008. diff --git a/raw/papers/zhu-moda-mixture-of-depths-2026.md b/raw/papers/zhu-moda-mixture-of-depths-2026.md new file mode 100644 index 0000000..aa1abd5 --- /dev/null +++ b/raw/papers/zhu-moda-mixture-of-depths-2026.md @@ -0,0 +1,23 @@ +--- +title: "Mixture-of-Depths Attention" +arxiv_id: "2603.15619" +authors: ["Lianghui Zhu", "Yuxin Fang", "Bencheng Liao", "Shijie Wang", "Tianheng Cheng", "Zilong Huang", "Chen Chen", "Lai Wei", "Yutao Zeng", "Ya Wang", "Yi Lin", "Yu Li", "Xinggang Wang"] +published: "2026-03-26" +updated: "2026-03-26" +categories: ["cs.LG", "cs.AI", "cs.CL"] +primary_category: "cs.LG" +url: "https://arxiv.org/abs/2603.15619" +github: "https://github.com/hustvl/MoDA" +abstract: | + Scaling depth is a key driver for large language models (LLMs). Yet, as LLLs become deeper, they often suffer from signal degradation: informative features formed in shallow layers are gradually diluted by repeated residual updates, making them harder to recover in deeper layers. We introduce mixture-of-depths attention (MoDA), a mechanism that allows each attention head to attend to sequence KV pairs at the current layer and depth KV pairs from preceding layers. We further describe a hardware-efficient algorithm for MoDA that resolves non-contiguous memory-access patterns, achieving 97.3% of FlashAttention-2's efficiency at a sequence length of 64K. Experiments on 1.5B-parameter models demonstrate that MoDA consistently outperforms strong baselines. Notably, it improves average perplexity by 0.2 across 10 validation benchmarks and increases average performance by 2.11% on 10 downstream tasks, with a negligible 3.7% FLOPs computational overhead. We also find that combining MoDA with post-norm yields better performance than using it with pre-norm. These results suggest that MoDA is a promising primitive for depth scaling. +--- + +# Mixture-of-Depths Attention + +**arXiv:** 2603.15619 [cs.LG] +**Published:** 2026-03-26 +**Authors:** Lianghui Zhu, Yuxin Fang, Bencheng Liao, Shijie Wang, Tianheng Cheng, Zilong Huang, Chen Chen, Lai Wei, Yutao Zeng, Ya Wang, Yi Lin, Yu Li, Xinggang Wang + +## Abstract + +Scaling depth is a key driver for large language models (LLMs). Yet, as LLMs become deeper, they often suffer from signal degradation: informative features formed in shallow layers are gradually diluted by repeated residual updates, making them harder to recover in deeper layers. We introduce mixture-of-depths attention (MoDA), a mechanism that allows each attention head to attend to sequence KV pairs at the current layer and depth KV pairs from preceding layers. We further describe a hardware-efficient algorithm for MoDA that resolves non-contiguous memory-access patterns, achieving 97.3% of FlashAttention-2's efficiency at a sequence length of 64K. Experiments on 1.5B-parameter models demonstrate that MoDA consistently outperforms strong baselines. Notably, it improves average perplexity by 0.2 across 10 validation benchmarks and increases average performance by 2.11% on 10 downstream tasks, with a negligible 3.7% FLOPs computational overhead. We also find that combining MoDA with post-norm yields better performance than using it with pre-norm. These results suggest that MoDA is a promising primitive for depth scaling. diff --git a/raw/papers/zhuang-catsurvey-ml-2024.md b/raw/papers/zhuang-catsurvey-ml-2024.md new file mode 100644 index 0000000..7f36a29 --- /dev/null +++ b/raw/papers/zhuang-catsurvey-ml-2024.md @@ -0,0 +1,29 @@ +# Survey of Computerized Adaptive Testing: A Machine Learning Perspective + +**arXiv:** 2404.00712v4 +**DOI:** https://doi.org/10.48550/arXiv.2404.00712 +**Published:** IEEE TPAMI 2026 (accepted) +**Submitted:** 2024-03-31 | **Last Revised:** 2026-03-15 + +## Authors +Yan Zhuang, Qi Liu, Haoyang Bi, Zhenya Huang, Weizhe Huang, Jiatong Li, Junhao Yu, Zirui Liu, Zirui Hu, Yuting Hong, Zachary A. Pardos, Haiping Ma, Mengxiao Zhu, Shijin Wang, Enhong Chen + +## Abstract +Computerized Adaptive Testing (CAT) offers an efficient and personalized method for assessing examinee proficiency by dynamically adjusting test questions based on individual performance. Compared to traditional, non-personalized testing methods, CAT requires fewer questions and provides more accurate assessments. As a result, CAT has been widely adopted across various fields, including education, healthcare, sports, sociology, and the evaluation of AI models. While traditional methods rely on psychometrics and statistics, the increasing complexity of large-scale testing has spurred the integration of machine learning techniques. This paper aims to provide a machine learning-focused survey on CAT, presenting a fresh perspective on this adaptive testing paradigm. We delve into measurement models, question selection algorithm, bank construction, and test control within CAT, exploring how machine learning can optimize these components. Through an analysis of current methods, strengths, limitations, and challenges, we strive to develop robust, fair, and efficient CAT systems. By bridging psychometric-driven CAT research with machine learning, this survey advocates for a more inclusive and interdisciplinary approach to the future of adaptive testing. + +## Subjects +- Machine Learning (cs.LG) +- Artificial Intelligence (cs.AI) +- Computers and Society (cs.CY) +- Information Retrieval (cs.IR) + +## Submission History +- v1: 2024-03-31 (2,589 KB) +- v2: 2024-04-05 (2,179 KB) +- v3: 2026-03-09 (3,980 KB) +- v4: 2026-03-15 (3,980 KB) - current + +## Links +- PDF: https://arxiv.org/pdf/2404.00712 +- HTML: https://arxiv.org/html/2404.00712v4 +- arXiv: https://arxiv.org/abs/2404.00712