20260420:first commit

2026-04-20 11:42:41 +08:00
commit dd8345a6ea
45 changed files with 2366 additions and 0 deletions
--- a/concepts/agent-mediated-deception.md
+++ b/concepts/agent-mediated-deception.md
@@ -0,0 +1,47 @@
+---
+title: "代理中介欺骗 (Agent-Mediated Deception)"
+created: 2026-04-19
+updated: 2026-04-19
+type: concept
+tags: [alignment, deep-learning, research]
+sources: [raw/papers/li-amd-human-perception-2026.md]
+---
+
+# 代理中介欺骗 (Agent-Mediated Deception, AMD)
+
+## 定义
+
+Agent-Mediated Deception (AMD) 是一种新型攻击面，指被攻破或恶意设计的 LLM Agent 被用作武器，对其人类用户实施欺骗。这与传统的 Agent 自身安全风险不同，关注的是**Agent 作为中介对人类认知的攻击**。
+
+## 攻击机制
+
+当 Agent 被外部攻击者劫持，或模型内部产生欺骗性行为时，它可能：
+- 提供看似合理但错误的建议
+- 隐藏关键安全信息
+- 利用用户的信任进行社会工程学攻击
+
+## 人类脆弱性
+
+根据 Li et al. (2026) 的实证研究（303 名参与者）：
+- **仅 8.6%** 的用户能察觉到 AMD 攻击
+- 领域专家在特定场景下**更易受骗**（过度信任自动化工具）
+- 识别出 **6 种认知失败模式**
+- 风险意识与保护行为之间存在显著鸿沟
+
+## 防御策略
+
+- **有效警告**：应中断当前工作流，且验证成本低廉
+- **经验学习**：通过 HAT-Lab 等平台的模拟训练，>90% 用户能提高警惕
+- **人机协作设计**：需要重新思考 Agent 输出的人类可验证性
+
+## 开放问题
+
+- 如何设计 Agent 架构使其行为对人类可审计？
+- AMD 攻击的自动化检测方法？
+- 如何在保持 Agent 效率的同时降低人类易感性？
+
+## 相关概念
+
+- [[li-amd-human-perception]] — 原始论文
+- [[human-agent-trust]] — 人机信任研究
+- [[alignment]] — AI 对齐与安全
--- a/concepts/ai-mathematics.md
+++ b/concepts/ai-mathematics.md
@@ -0,0 +1,66 @@
+---
+title: "AI and Mathematics (AI 与数学)"
+created: 2025-04-15
+updated: 2025-04-15
+type: concept
+tags: [concept, ai-mathematics, llm, deep-learning, mathematics, research]
+sources: [raw/papers/tao-ai-mathematical-methods-2026.md]
+---
+
+# AI and Mathematics (AI 与数学)
+
+## 概述
+
+AI 与数学的交叉是当代最活跃的研究领域之一。数学被视为探索 AI 能力和限制的"沙盒"（sandbox）。
+
+## AI 在数学中的应用
+
+### 当前能力
+- 解决越来越复杂的数学问题
+- 生成可独立验证的证明
+- 协助数学家解决深奥的数学猜想
+
+### 典型弱点
+[[Terence Tao]] 指出当前 AI 工具展示出**显著且常常荒谬的弱点**：
+- 在某些任务上超越人类专家
+- 同时在基础概念上犯**令人据脸的基本错误**
+
+**Example**: 断言"所有奇数都是质数"——这是一个在人类数学培训早期就会被纠正的错误
+
+## 数学作为 "沙盒"
+
+[[Terence Tao]] 认为数学是探索 AI 影响的理想领域：
+
+1. **成熟的基础** - 数学有着深厚的历史和严谨的基础
+2. **假设性场景** - 适合探索与现实相反的抽象情境
+3. **客观标准** - 数学证明有明确的对/错标准
+4. **社区反馈** - 数学社区可以快速评估 AI 输出
+
+## 对数学研究的影响
+
+### 积极方面
+- 自动化繁琐的计算和验证
+- 辅助发现新的数学结果
+- 加速科学研究
+
+### 潜在风险
+- **教育问题** - 学生过度依赖 AI，损失培养数学目光和直觉
+- **证明质量** - "无味证明"泛滥：技术正确但缺乏启发性
+- **认知脱节** - 证明能力与推理过程的分离
+
+## 未来发展方向
+
+根据论文，数学研究可能会：
+
+1. **劳动分工** - 数学家专门化（使用 AI vs. 提出方向）
+2. **方法多样化** - 采用自然科学和人文学科的方法
+3. **重新定义标准** - 在自动验证时代重新定义 "好数学"
+
+## 关联页面
+
+- [[Mathematical methods and human thought in the age of AI]] - 详细阐述
+- [[Terence Tao]] - 该领域的主要思想家
+- [[human-centered-ai]] - 以人类为中心的 AI
+- [[formal-verification]] - 形式化验证
+- [[alpha-proof]] - DeepMind 的数学证明 AI
+- [[lean-mathlib]] - 大型形式化数学库
--- a/concepts/computerized-adaptive-testing.md
+++ b/concepts/computerized-adaptive-testing.md
@@ -0,0 +1,120 @@
+---
+title: Computerized Adaptive Testing (CAT)
+created: 2026-04-17
+updated: 2026-04-17
+type: concept
+tags: [machine-learning, benchmark]
+sources: [raw/papers/zhuang-catsurvey-ml-2024.md]
+---
+
+# Computerized Adaptive Testing (CAT)
+
+## Definition
+Computerized Adaptive Testing (CAT) 是一种动态测评范式：系统根据考生实时表现，自适应地调整后续题目难度，以最少的题量实现对个体能力的高精度评估。相比传统固定试卷测试，CAT 题量更少、测量精度更高。
+
+## 核心组件
+
+CAT 系统由四个关键模块组成：
+
+### 1. Measurement Models (测量模型)
+- **传统方法：** Item Response Theory (IRT) — 基于项目反应理论的概率模型，假设题目难度与考生能力之间存在 S 型响应曲线
+- **ML 方法：** 神经网络、深度知识追踪 (Deep Knowledge Tracing)、基于表示学习的测量模型 — 能够捕捉更复杂的题目-能力交互模式
+
+### 2. Question Selection Algorithms (选题策略)
+- **经典策略：** Maximum Fisher Information (MFI)、Maximum Posterior Weighted Information (MPWI)
+- **ML 策略：** 基于强化学习的选题、多臂老虎机 (Multi-armed Bandit)、深度 Q-Network — 在信息增益、暴露率控制、内容平衡之间做多目标优化
+
+### 3. Question Bank Construction (题库构建)
+- 题目标定 (calibration)、参数估计、题目质量监控
+- ML 方法可用于自动题目生成、难度预测、题目相似度聚类
+
+### 4. Test Control (测试控制)
+- 终止规则 (stopping criteria)：固定长度 vs 精度阈值
+- 内容平衡约束、题目曝光率控制、公平性约束
+- ML 方法：学习型终止规则、约束满足优化
+
+## 应用领域
+- **教育测评：** K-12 标准化考试、语言能力测试 (GRE, GMAT)
+- **医疗评估：** 症状筛查量表、心理健康测评
+- **体育科学：** 运动员能力分级
+- **社会学研究：** 态度与价值观量表
+- **AI 模型评估：** 自适应 benchmarking，根据模型表现动态调整测试难度（与 [[symbolic-regression]] 等评估场景相关）
+
+## ML 视角的范式转变
+
+传统 CAT 依赖心理测量学和统计学假设（如 IRT 的局部独立性、单维性假设）。随着大规模测试场景复杂度上升，机器学习提供了新的可能性：
+
+| 维度 | 传统心理测量学 | 机器学习方法 |
+|------|--------------|-------------|
+| 建模假设 | 强假设（单维性、局部独立） | 弱假设、数据驱动 |
+| 可扩展性 | 适合中小规模题库 | 天然支持大规模 |
+| 表达能力 | 线性/对数几率 | 非线性、高维交互 |
+| 可解释性 | 高（参数有明确意义） | 较低（黑盒风险） |
+| 公平性 | 已有成熟 DIF 检测 | 正在发展中 |
+
+## IRT 数学形式
+
+Item Response Theory 是传统 CAT 的核心数学引擎。
+
+### 核心符号
+- 考生能力: $\theta \in \mathbb{R}$
+- 题目 $i$ 参数: $\psi_i = (a_i, b_i, c_i)$
+- 作答: $u_i \in \{0, 1\}$
+- ICC (Item Characteristic Curve): $P_i(\theta) = P(u_i = 1 \mid \theta, \psi_i)$
+
+### 模型层级
+
+**1PL (Rasch Model):**
+$$P_i(\theta) = \frac{1}{1 + e^{-(\theta - b_i)}}$$
+仅含难度参数 $b_i$。当 $\theta = b_i$ 时 $P_i = 0.5$。
+
+**2PL (CAT 最常用):**
+$$P_i(\theta) = \frac{1}{1 + e^{-a_i(\theta - b_i)}}$$
+区分度 $a_i > 0$ 控制曲线斜率。导数: $\frac{dP_i}{d\theta} = a_i P_i(1 - P_i)$，在 $\theta = b_i$ 处达最大值 $a_i / 4$。
+
+**3PL (含猜测):**
+$$P_i(\theta) = c_i + (1 - c_i) \frac{1}{1 + e^{-a_i(\theta - b_i)}}$$
+猜测概率 $c_i \in [0,1]$。$\theta \to -\infty$ 时 $P_i \to c_i$。
+
+### Fisher 信息量与选题
+
+题目 $i$ 的 Fisher 信息:
+$$I_i(\theta) = \frac{[\partial P_i / \partial \theta]^2}{P_i(1 - P_i)} = a_i^2 P_i(\theta)(1 - P_i(\theta)) \quad (\text{2PL})$$
+
+- $\theta = b_i$ 时信息量最大: $I_i = a_i^2 / 4$
+- $\theta \gg b_i$ 或 $\theta \ll b_i$ 时 $I_i \to 0$
+
+**CAT 选题:** $i^* = \arg\max_{i} I_i(\hat{\theta}_{\text{当前}})$
+
+### 能力估计
+
+**对数似然:**
+$$\ell(\theta) = \sum_{j=1}^{t} \left[ u_j \ln P_j(\theta) + (1 - u_j) \ln(1 - P_j(\theta)) \right]$$
+
+**Newton-Raphson 迭代:**
+$$\theta^{(k+1)} = \theta^{(k)} + \frac{\ell'(\theta^{(k)})}{I(\theta^{(k)})}, \quad I(\theta) = \sum_{j=1}^t I_j(\theta)$$
+
+**标准误:** $SE(\hat{\theta}) = 1 / \sqrt{I(\hat{\theta})}$
+
+### 多维 IRT (MIRT)
+
+$$P_i(\boldsymbol{\theta}) = \frac{1}{1 + e^{-(\mathbf{a}_i^\top \boldsymbol{\theta} - d_i)}}, \quad \boldsymbol{\theta} \in \mathbb{R}^D$$
+
+对应多维自适应测试 (MAT)，选题需最大化多维信息矩阵的标量函数（行列式或迹）。
+
+## 开放问题与挑战
+1. **公平性与偏差：** 自适应算法可能放大历史数据中的群体偏差
+2. **可解释性：** 深度学习模型的可解释性 vs 心理测量学的透明度
+3. **冷启动问题：** 新题目/新考生的初始参数估计
+4. **安全性：** 题库泄露风险、对抗性攻击
+5. **跨模态测评：** 如何整合文本、图像、交互等多模态数据
+6. **LLM 测评：** 如何用 CAT 范式评估大语言模型能力（自适应 benchmarking）
+
+## 相关概念
+
+- [[cramer-rao-lower-bound]] — CRLB 设定了 CAT 能力估计方差的理论下界，CAT 选题策略本质上是在最大化 Fisher 信息以快速逼近该下界
+- [[symbolic-regression]] — 符号回归中的自适应搜索策略与 CAT 选题策略在"动态探索-利用权衡"上有结构相似性
+- [[knowledge-bank]] — 自适应测评系统需要结构化知识/题库管理，与知识管理系统的设计思想相通
+
+## 关键文献
+- Zhuang et al. (2024/2026). *Survey of Computerized Adaptive Testing: A Machine Learning Perspective*. arXiv:2404.00712v4. Accepted by IEEE TPAMI 2026.
--- a/concepts/cramer-rao-lower-bound.md
+++ b/concepts/cramer-rao-lower-bound.md
@@ -0,0 +1,77 @@
+---
+title: Cramér-Rao Lower Bound (CRLB)
+created: 2026-04-17
+updated: 2026-04-17
+type: concept
+tags: [machine-learning, benchmark]
+sources: [raw/papers/hbs-cramerrao-bound-notes.md]
+---
+
+# Cramér-Rao Lower Bound (CRLB)
+
+## Definition
+The Cramér-Rao Lower Bound (CRLB) states that for **any unbiased estimator** of a population parameter $\theta$, the lowest possible variance is the reciprocal of the Fisher Information $I(\theta)$:
+$$\text{Var}(\hat{\theta}) \geq \frac{1}{I(\theta)}$$
+
+It represents a fundamental limit in statistical estimation: no matter how clever your estimation method is, you cannot beat this bound.
+
+## Key Concepts
+
+### 1. The Score Function
+The score $g(\theta; \mathbf{x})$ is the derivative of the log-likelihood with respect to the parameter:
+$$g(\theta; \mathbf{x}) = \frac{\partial}{\partial \theta} \log f(\mathbf{x} \mid \theta)$$
+- It measures the "force" the data exerts on the parameter estimate.
+- **Crucial property:** $\mathbb{E}[g(\theta; \mathbf{x})] = 0$ (under regularity conditions).
+
+### 2. Fisher Information
+Fisher Information $I(\theta)$ is the variance of the score function:
+$$I(\theta) = \text{Var}(g(\theta; \mathbf{x})) = \mathbb{E}\left[ \left( \frac{\partial}{\partial \theta} \log f(\mathbf{x} \mid \theta) \right)^2 \right]$$
+
+**Alternative expression (via curvature):**
+$$I(\theta) = -\mathbb{E}\left[ \frac{\partial^2}{\partial \theta^2} \log f(\mathbf{x} \mid \theta) \right]$$
+This connects information directly to the curvature of the log-likelihood function. A sharper peak (higher curvature) means higher information and a tighter bound.
+
+**Properties:**
+- $I(\theta)$ is proportional to sample size $n$ ($I_n = n \cdot I_1$).
+- Higher variance in the data means lower information per data point.
+
+### 3. Observed vs. Expected Information
+- **Expected Information:** Uses the true parameter and expectation over all possible data. Formula-based.
+- **Observed Information:** Uses the actual observed data and the estimated parameter $\hat{\theta}$. Computed from the Hessian of the log-likelihood at $\hat{\theta}$.
+- In practice (especially in MLE), standard errors are calculated using the observed information.
+
+## Classic Examples
+
+### Normal Distribution (Mean Estimation)
+- **Parameter:** $\mu$
+- **Score:** $g(\mu) = \frac{n}{\sigma^2}(\bar{x} - \mu)$
+- **Fisher Information:** $I = \frac{n}{\sigma^2}$
+- **CRLB:** $\frac{\sigma^2}{n}$
+- **Conclusion:** The sample mean $\bar{x}$ is the "best" unbiased estimator, as its variance exactly hits the bound.
+
+### Binomial Distribution (Proportion Estimation)
+- **Parameter:** $\pi$
+- **Score:** $g(\pi) = \frac{k}{\pi} - \frac{n-k}{1-\pi}$
+- **Fisher Information:** $I = \frac{n}{\pi(1-\pi)}$
+- **CRLB:** $\frac{\pi(1-\pi)}{n}$
+- **Conclusion:** The sample proportion $\hat{\pi} = k/n$ is the optimal unbiased estimator.
+
+## Connection to Maximum Likelihood Estimation (MLE)
+- MLE is **consistent** and **asymptotically efficient**.
+- As sample size $n \to \infty$, the variance of the MLE approaches the CRLB: $\text{Var}(\hat{\theta}_{\text{MLE}}) \approx 1/I(\theta)$.
+- This is why standard errors reported by MLE software are calculated as $1/\sqrt{I_{\text{observed}}}$.
+
+## Role in Computerized Adaptive Testing (CAT)
+In CAT, the CRLB dictates the theoretical limit of measurement precision. 
+- Each question contributes a certain amount of Fisher Information $I_i(\theta)$.
+- The test continues until the accumulated information $I(\theta) = \sum I_i(\theta)$ is large enough that $1/I(\theta)$ (the minimum possible variance) is below a predefined threshold.
+- **选题策略 (Item Selection):** Choosing the item with the maximum $I_i(\theta)$ at the current ability estimate $\hat{\theta}$ is equivalent to driving the CRLB down as fast as possible.
+
+## Multidimensional Extension (Information Matrix)
+For a vector of parameters $\boldsymbol{\theta}$, the Fisher Information becomes a matrix $\mathbf{I}(\boldsymbol{\theta})$. The CRLB states that the covariance matrix of any unbiased estimator satisfies:
+$$\text{Cov}(\hat{\boldsymbol{\theta}}) \succeq \mathbf{I}(\boldsymbol{\theta})^{-1}$$
+(where $\succeq$ denotes positive semi-definiteness).
+
+## 相关概念
+- [[computerized-adaptive-testing]] — CAT 的核心目标是最小化能力估计方差，CRLB 提供了理论下界，选题策略本质上是在最大化 Fisher 信息以快速逼近该下界。
+- [[eml-universal-operator]] — EML 树的梯度优化依赖于对参数空间的曲率估计，与 CRLB 中 Fisher 信息作为对数似然曲率的数学本质相通。
--- a/concepts/curvine-distributed-cache.md
+++ b/concepts/curvine-distributed-cache.md
@@ -0,0 +1,41 @@
+---
+title: "Curvine 云原生分布式缓存"
+created: 2026-04-19
+updated: 2026-04-19
+type: concept
+tags: [system-design, performance, tooling]
+sources: [raw/articles/oppo-multimodal-data-lake-2026.md]
+---
+
+# Curvine 云原生分布式缓存
+
+**开发者:** OPPO (已开源) · GitHub: https://github.com/curvineio/curvine
+
+## 定义
+Curvine 是 OPPO 自研并开源的云原生高性能分布式缓存文件系统，专为解决云上对象存储 IO 性能瓶颈而设计。
+
+## 解决的问题
+1. **OSS 带宽配额瓶颈**：云厂商默认读带宽限制在大数据场景下易成瓶颈
+2. **专线带宽压力**：混合云架构下，重复读取易打爆专线，影响其他业务
+3. **计算节点磁盘闲置**：节点配置的云盘（如 2.5TB）主要用于 Shuffle，利用率常低于 20%
+
+## 核心特性
+- **双模式支持**：
+  - 缓存模式：读写与 OSS 保持一致
+  - FS 模式：Curvine 管理元数据，支持完整 POSIX 语义，对象存储数据可作本地盘访问
+- **协议兼容**：支持 S3、HDFS 协议，原生支持 Kubernetes CSI 模式
+- **任务调度**：常驻服务，处理数据加载和大文件操作
+
+## 应用场景与性能
+- **LanceDB 向量查询加速**：社区版 LanceDB + Curvine 性能 ≈ LanceDB 商业版
+- **索引与元数据缓存**：支持预热模式，高性能访问 LanceDB 索引和 Manifest
+- **热表数据加速**：重复读取数据从 OSS 加载至本地缓存盘
+- **Checkpoint 写入加速**：高频模型训练写入提供高性能支持
+
+## 未来规划
+- 扩展为数据转换服务层：自动转 Lance 格式、自动构建索引、小文件自动合并
+
+## 相关概念
+
+- [[oppo-multimodal-data-lake]] — OPPO 数据湖实践
+- [[gravitino-unified-metadata]] — 元数据管理配套
--- a/concepts/depth-scaling-signal-degradation.md
+++ b/concepts/depth-scaling-signal-degradation.md
@@ -0,0 +1,37 @@
+---
+title: "LLM 深度扩展与信号退化"
+created: 2026-04-19
+updated: 2026-04-19
+type: concept
+tags: [architecture, deep-learning, transformer]
+sources: [raw/papers/zhu-moda-mixture-of-depths-2026.md]
+---
+
+# LLM 深度扩展与信号退化 (Depth Scaling & Signal Degradation)
+
+## 背景
+
+增加模型深度是提升 LLM 性能的关键途径之一。然而，深度扩展面临**信号退化**问题：随着层数增加，浅层提取的信息特征在多次残差更新中被稀释，导致深层难以有效利用这些特征。
+
+## 信号退化机制
+
+在标准 Transformer 的残差流（Residual Stream）中：
+$$x_{l+1} = x_l + f_l(x_l)$$
+其中 $f_l$ 是第 $l$ 层的变换（注意力 + FFN）。随着 $l$ 增加，$x_0$ 的原始信息被多次叠加的 $f_k$ 覆盖，导致"遗忘"。
+
+## 缓解策略
+
+### 架构级
+- **MoDA (Mixture-of-Depths Attention)**：注意力头直接跨层访问前序 KV [[mixture-of-depths-attention]]
+- **残差连接变体**：如 Pre-Norm vs Post-Norm，影响梯度流动
+- **层归一化位置**：Post-Norm 在 MoDA 中表现更好
+
+### 训练级
+- **深度初始化**：特殊初始化策略保持信号幅度
+- **梯度裁剪与缩放**：防止深层梯度爆炸/消失
+
+## 相关概念
+
+- [[mixture-of-depths-attention]] — MoDA 机制
+- [[zhu-moda-mixture-of-depths]] — MoDA 论文
+- [[transformer-architecture]] — Transformer 基础架构
--- a/concepts/eml-operator.md
+++ b/concepts/eml-operator.md
@@ -0,0 +1,128 @@
+---
+title: "EML 算子 (Exp-Minus-Log)"
+created: 2026-04-16
+updated: 2026-04-16
+type: concept
+tags: [algorithm, concept, research]
+sources: [raw/papers/odrzywolek-eml-single-operator-2026.md]
+---
+
+# EML 算子 (Exp-Minus-Log)
+
+## 定义
+
+EML (Exp-Minus-Log) 是一个二元算子，定义为：
+
+$$\text{eml}(x,y) = \exp(x) - \ln(y)$$
+
+该算子配合常数 $1$，构成了连续数学中的 **Sheffer 算子**——单一算子足以生成所有初等函数。
+
+## 核心性质
+
+### 完备性
+- 与数字电路中的 NAND 门类似，EML 对初等函数具有完备性
+- 两按钮计算器 $(1, \text{eml})$ 可替代 36 按钮科学计算器
+- 可生成：所有算术运算、超越函数、数学常数 ($e,\pi,i$)
+
+### 二叉树结构
+每个 EML 表达式是同质节点的二叉树：
+
+$$S \to 1 \mid \text{eml}(S,S)$$
+
+这种结构与满二叉树和 Catalan 数同构，提供了规则的搜索空间。
+
+### 复数中间值
+- EML 计算需要在复数域内进行（至少内部如此）
+- 类似于量子计算使用复振幅计算实概率
+- 生成 $i$ 和 $\pi$ 需要计算 $\ln(-1)$
+
+## 基本构造示例
+
+| 目标 | EML 表达式 | 深度 |
+|------|-----------|------|
+| $e$ | $\text{eml}(1,1)$ | 1 |
+| $e^x$ | $\text{eml}(x,1)$ | 1 |
+| $\ln(x)$ | $\text{eml}(1,\text{eml}(\text{eml}(1,x),1))$ | 3 |
+| $0$ | $\text{eml}(\text{eml}(1,1),\text{eml}(1,1))$ | 3 |
+| $-1$ | 复杂组合 | 15-17 |
+| $x+y$ | 复杂组合 | 19-27 |
+| $x\times y$ | 复杂组合 | 17-41 |
+
+## 变体算子
+
+$$\begin{align}
+\text{eml}(x,y) &= \exp(x) - \ln(y) & \text{需常量 } 1 \\
+\text{edl}(x,y) &= \exp(x) / \ln(y) & \text{需常量 } e \\
+-\text{eml}(y,x) &= \ln(x) - \exp(y) & \text{需常量 } -\infty
+\end{align}$$
+
+## 约化历程
+
+从 36 个原始操作到 EML 的逐步约化：
+
+1. **Base-36** — 标准科学计算器 (36 个原始操作)
+2. **Calc 3** — 保留 $\exp,\ln,-x,1/x,+$ (6 个)
+3. **Calc 2** — 保留 $\exp,\ln,-$ (4 个)
+4. **Calc 1** — 使用 $x^y,\log_x y$ 和常量 $e$ 或 $\pi$ (4 个)
+5. **Calc 0** — 使用 $\exp$ 和 $\log_x y$ (3 个)
+6. **EML** — 单一二元算子 + 常量 1 (2 个)
+
+## 应用场景
+
+### 符号回归
+EML 树可作为"主公式"架构：
+- 构造固定深度的完整二叉树
+- 每个输入是 $1$、变量 $x$ 或子树结果的线性组合
+- 使用梯度优化（Adam）训练参数
+- 训练后将权重"吸附"到 0/1 精确值
+
+### 模拟电路
+EML 可作为模拟计算的基本构建块，类似于运算放大器。
+
+### 形式化验证
+- 在 Mathematica 和 IEEE754 浮点中工作良好
+- 在 Lean 4 中遇到挑战（因 $\ln(0)=0$ 的"垃圾值"定义）
+- 需要处理扩展实数 ($\pm\infty$) 和复数分支切割
+
+## 与符号回归的联系
+
+EML 树表示使得 [[symbolic-regression]] 可通过梯度下降而非组合搜索实现：
+
+1. **可训练电路**：EML 树成为可微分计算图
+2. **标准优化器**：Adam 等梯度方法可优化树参数
+3. **精确恢复**：在浅层深度（≤4）时，该方法可从数值数据恢复闭式初等函数
+4. **损失地形**：统一结构相比异构表达式树可能提供更优的优化地形
+
+## 与布尔逻辑的类比
+
+| 方面 | 布尔逻辑 | 连续数学 |
+|------|----------|----------|
+| 通用原语 | NAND/NOR 门 | **EML 算子** |
+| 元数 | 2 输入 | 2 输入 |
+| 完备性 | 所有布尔函数 | 所有初等函数 |
+| 结构 | 统一门网络 | 统一 EML 树 |
+| 搜索空间 | 离散 | 连续（可微） |
+
+## 研究意义
+
+1. **神经-符号集成**：桥接神经网络（可微）与符号数学
+2. **发现方法**：通过系统穷举搜索发现——暗示可能存在其他通用原语
+3. **科学发现**：有潜力从数据中自动发现物理定律
+4. **教育意义**：暗示微积分/分析教学的极简基础
+
+## 开放问题
+
+1. **无常量 Sheffer 算子** — 是否存在不需要区分常量的二元算子？
+2. **一元 Sheffer 算子** — 是否存在同时作为激活函数和初等函数生成器的一元算子？
+3. **更好性质的变体** — 是否存在非指数渐近、无定义域问题的类似算子？
+4. **连续族** — EML 是否属于一个更大的连续算子族？
+5. **最小深度** — 特定函数所需的最小 EML 树深度是多少？
+6. **多维推广** — 该方法能否扩展到多元函数和偏微分方程？
+7. **泛化影响** — EML 表示如何影响学习模型的泛化能力？
+
+## 相关页面
+
+- [[odrzywolek-eml-single-operator]] — EML 算子论文
+- [[symbolic-regression]] — 应用领域
+- [[computerized-adaptive-testing]] — CRLB 相关应用
+- [[cramer-rao-lower-bound]] — Fisher 信息与参数估计
--- a/concepts/formal-verification.md
+++ b/concepts/formal-verification.md
@@ -0,0 +1,54 @@
+---
+title: "Formal Verification (形式化验证)"
+created: 2025-04-15
+updated: 2025-04-15
+type: concept
+tags: [concept, mathematics, logic, ai-mathematics, verification]
+sources: [raw/papers/tao-ai-mathematical-methods-2026.md]
+---
+
+# Formal Verification (形式化验证)
+
+## 定义
+
+**Formal Verification** 是使用形式化方法（如一阶逻辑、集合论）来验证数学证明或计算机程序正确性的过程。
+
+## 历史背景
+
+数学传统上有客观的证明标准：
+- 从欧几里得到二十世纪初的基础
+- 尽管如此，人类数学家的论证通常不达到完美严格的理想
+- 错误是常见的，有些被修正，有些成为 "folklore"
+
+## 形式化验证的局限
+
+[[Terence Tao]] 在其论文中指出了形式化验证的两个关键局限：
+
+### 1. 翻译问题
+Formal verification only certifies that a formalized argument establishes a formal mathematical statement, but does not rule out errors in translation between the formal statement and the original intended statement.
+
+**Example** (陶哲轩的费马大定理例子)：
+- 费马大定理断言：对于 $n > 2$，方程 $a^n + b^n = c^n$ 没有自然数解
+- 隐含假设：自然数从 1 开始，而非 0
+- 如果 AI 错误地允许 $a, b, c$ 为 0，可能"证明"费马大定理是错误的！
+
+### 2. 无法捕捉 "Penumbra"
+即使形式化验证可以确保推理的正确性，它无法捕捉：
+- **Heuristics** 启发式 - 为什么这个方法有效
+- **Motivation** 动机 - 为什么要研究这个问题
+- **Context** 背景 - 如何广泛地理解这个结果
+- **Narrative** 叙事 - 证明的策略和构思
+
+## AI 时代的意义
+
+[[Terence Tao]] 认为：
+- AI 可以自动化形式化证明的生成
+- 但这可能产生 "odorless proofs"（无味证明）：技术上正确，但缺乏启发性
+- 人类数学家需要专注于那些不容易自动验证的方面
+
+## 关联页面
+
+- [[Mathematical methods and human thought in the age of AI]] - 详细讨论
+- [[Terence Tao]] - 该概念的主要阐述者
+- [[lean-mathlib]] - 论文提及的大型形式化数学库
+- [[smell-test]] - "气味测试"概念
--- a/concepts/gravitino-unified-metadata.md
+++ b/concepts/gravitino-unified-metadata.md
@@ -0,0 +1,35 @@
+---
+title: "Gravitino 统一元数据管理"
+created: 2026-04-19
+updated: 2026-04-19
+type: concept
+tags: [system-design, tooling]
+sources: [raw/articles/oppo-multimodal-data-lake-2026.md]
+---
+
+# Gravitino 统一元数据管理
+
+**应用案例:** OPPO 多模态数据湖 (2026)
+
+## 背景
+在构建多模态数据湖初期，OPPO 面临算法数据散落在数百 PB 的 PB 级脚本中，缺乏归属人、使用情况和依赖关系的管理，导致严重的元数据混乱和数据滥用问题。
+
+## 核心能力
+
+1. **统一 Catalog**：支持多引擎友好，实现 Hive 表与 Lance 表在同一套目录下的统一管理
+2. **多云分布支持**：适配混合云模式（自建机房 + 阿里云），数据分布对业务无感，简化表与数据迁移
+3. **数据资产全局可感知**：实现目录归属人、每日账单、上下游依赖关系的精准归因，数据治理清晰可控
+
+## 落地策略
+- **收口机制**：强制所有新增目录必须通过 Gravitino 访问，否则拒绝
+- **存量转换**：通过控制增量、逐步转换存量的方式，最终将所有元数据收归统一平台
+
+## 收益
+- 用户侧：一次查询、少搬数据、权限统一
+- 架构侧：元数据集中、易扩展、易治理
+- 支持联邦查询：单条 SQL 跨 Hive/Lance 表 JOIN
+
+## 相关概念
+
+- [[oppo-multimodal-data-lake]] — OPPO 数据湖实践
+- [[curvine-distributed-cache]] — 配套加速层 Curvine
--- a/concepts/human-agent-trust.md
+++ b/concepts/human-agent-trust.md
@@ -0,0 +1,38 @@
+---
+title: "人机信任 (Human-Agent Trust)"
+created: 2026-04-19
+updated: 2026-04-19
+type: concept
+tags: [alignment, research]
+sources: [raw/papers/li-amd-human-perception-2026.md]
+---
+
+# 人机信任 (Human-Agent Trust)
+
+## 背景
+
+随着 LLM Agent 在软件开发、医疗等高风险领域成为受信任的副驾驶（copilots），人机信任问题从理论走向实践。信任的建立与滥用构成了新的安全挑战。
+
+## 核心矛盾
+
+- **信任的必要性**：Agent 需要一定的用户信任才能有效协作
+- **信任的脆弱性**：过度信任导致用户对 Agent 输出缺乏批判性验证
+- **领域专家悖论**：专家在自身领域可能更倾向于信任工具的输出，反而在特定场景下更易受 AMD 攻击
+
+## 研究进展
+
+- **HAT-Lab** (Li et al., 2026)：首个高保真人机信任实验平台，涵盖 9 个真实场景
+- **认知失败模式**：识别了 6 种用户在面对欺骗性 Agent 时的认知失效路径
+- **经验学习**：通过模拟体验，用户可显著提高对 AMD 的警惕性（>90%）
+
+## 防御设计原则
+
+1. **可验证性**：Agent 的输出应易于人类交叉验证
+2. **低成本警告**：安全警告应中断工作流但验证成本低
+3. **信任校准**：帮助用户建立对 Agent 能力的准确预期，避免过度或不足信任
+
+## 相关概念
+
+- [[agent-mediated-deception]] — AMD 攻击与防御
+- [[human-centered-ai]] — 以人为中心的 AI 哲学
+- [[li-amd-human-perception]] — 实证研究论文
--- a/concepts/human-centered-ai.md
+++ b/concepts/human-centered-ai.md
@@ -0,0 +1,43 @@
+---
+title: "Human-Centered AI (以人类为中心的 AI)"
+created: 2025-04-15
+updated: 2025-04-15
+type: concept
+tags: [concept, ai-philosophy, alignment, llm, deep-learning]
+sources: [raw/papers/tao-ai-mathematical-methods-2026.md]
+---
+
+# Human-Centered AI (以人类为中心的 AI)
+
+## 定义
+
+**Human-Centered AI (HCAI)** 是一种 AI 发展和应用的哲学框架，强调 AI 工具应当设计和使用以增强人类能力、满足人类需求和提升人类生活质量为核心目标。
+
+**核心原则**（来自 [[Terence Tao]] 和 [[Tanya Klowden]]）：
+1. AI 是人类历史上为促进思想的创造、组织和传播而发展的工具的自然演进
+2. 必须确保 AI 的发展和应用保持**根本上以人类为中心**
+3. 创新应以满足人类需求为导向
+4. 增进人类思维和理解能力
+
+## 与其他 AI 哲学的区别
+
+| 方向 | 焦点 | 以人类为中心 |
+|-------|------|------------|
+| 技术決定论 | 技术自身的发展 | 技术为人类服务 |
+| 效率优先 | 自动化和取代人类 | 增强人类能力 |
+| 工具主义 | AI 作为独立实体 | AI 作为人类工具 |
+
+## 在数学中的应用
+
+在 [[Mathematical methods and human thought in the age of AI]] 中，陶哲轩提出：
+
+- AI 可以处理费力的计算，但人类数学家应专注于启发式、创造性的工作
+- "Smell Test"（气味测试）：好的数学不仅要正确，还要有启发性
+- 不能让 AI 的 "odorless proofs"（无味证明）取代人类的理解和洞察
+
+## 关联页面
+
+- [[Mathematical methods and human thought in the age of AI]] - 详细阐述以人类为中心 AI 的论文
+- [[Terence Tao]] - 该概念的主要倡导者之一
+- [[alignment]] - AI 对齐/安全
+- [[ai-philosophy]] - AI 哲学
--- a/concepts/knowledge-bank.md
+++ b/concepts/knowledge-bank.md
@@ -0,0 +1,96 @@
+---
+title: Knowledge Bank — AI 辅助开发时代的知识管理系统
+created: 2026-04-16
+updated: 2026-04-17
+type: concept
+tags: [knowledge-management, open-source, multi-agent]
+sources: [raw/articles/knowledge-bank-ai-dev-2026.md]
+---
+
+# Knowledge Bank
+
+面向 AI 辅助开发时代的知识管理系统，通过自动捕获、结构化存储和智能检索，让开发团队的知识真正流动起来。
+
+项目仓库: [gabrywu-public/knowledge-bank](https://github.com/gabrywu-public/knowledge-bank)
+
+## 核心洞察
+
+### 转变一：知识受众从"人"变为"机器"
+
+传统知识管理假设知识是给人阅读的（精美文档、结构化 wiki、详细注释），但现实中开发者不会主动看文档，即使看了也记不住、找不到、或已过时。
+
+在 AI 辅助开发时代，**真正的知识消费者是 AI 代码助手**（Claude Code、Cursor、GitHub Copilot）。知识需要结构化、情境化、可检索的格式，让 AI 能快速理解和应用。
+
+### 转变二：三维知识分类体系
+
+不再按主题分类，而是采用 **作用域 + 来源 + 类型** 的三维分类：
+
+| 维度 | 分类 | 说明 |
+|------|------|------|
+| **作用域 (Scope)** | 个人 / 项目 / 组织 | 知识的共享边界，避免知识冲突，实现精准注入 |
+| **来源 (Source)** | AI 观察 > 架构师决策 > Reviewer 偏好 > 开发者经验 | 知识的权威性权重；AI 观察因来自实际代码、可验证、实时性而权重最高 |
+| **类型 (Type)** | 代码模式 / 架构决策 / 配置偏好 / 陷阱警示 / API 用法 | 知识的应用方式 |
+
+**关键设计：AI 观察的可信度最高** —— 这违反直觉但合理，因为 AI 观察直接来自实际代码（可追溯到 commit），反映当前真实状态，而非人为偏好或可能过时的文档。
+
+### 转变三：知识生命周期重构
+
+从 **"写作→发布→被遗忘→过时→删除"** 转变为 **"捕获→检索→应用→收集"**：
+
+- **零摩擦捕获**: 不需要开发者专门写文档，知识在开发过程中自动提取
+- **情境化检索**: 不是被动等待查询，而是主动在需要时注入相关知识
+- **智能去重**: 通过多维度相似度评分（标题 40% + 摘要 30% + 内容 20% + 上下文 10%）自动合并
+- **持续进化**: 知识库随项目发展自动更新和优化
+
+## 技术架构
+
+### Fork Context（上下文隔离架构）
+
+知识操作（检测、去重、评分）在分叉的隔离环境中执行，不干扰主会话：
+
+1. **会话开始 → 知识注入**: 提取关键词 → 搜索知识 → 相关性评分 → 过滤 → 格式化注入
+2. **会话结束 → 知识收集**: 分析会话记录 → 识别有价值知识点 → 4 项资格检查 → 去重 → 创建/更新知识
+
+优势：主会话保持简洁，复杂分析不干扰用户体验，可并行执行。
+
+### 强制仓库关联 (Repository-Aware)
+
+所有知识和会话必须关联到 Git 仓库（`repository_id NOT NULL`），确保数据完整性和精准检索。
+
+### 完整会话追踪
+
+记录每次开发会话的完整上下文：session_id、仓库、分支、commit、工具使用、文件修改等。
+
+## 知识生命周期七阶段
+
+Knowledge Bank 将知识管理融入软件开发全流程，形成"生长的枝干"：
+
+1. **需求分析**: 自动检索历史需求知识，注入相关业务规则
+2. **架构设计**: 自动注入项目架构规范，收集新的设计决策
+3. **编码开发**: 自动注入编码规范，识别新的代码模式
+4. **测试验证**: 自动注入已知陷阱，收集新的 edge case
+5. **Code Review**: AI 辅助审查，更新 Review 规则
+6. **部署运维**: 基于历史故障经验自动诊断，收集运维知识
+7. **迭代优化**: 追溯完整知识链路，指导优化决策
+
+## 与传统知识管理的对比
+
+| 维度 | 传统方式 | Knowledge Bank |
+|------|----------|----------------|
+| 受众 | 人 | AI（+ 人） |
+| 载体 | 静态文档 | 动态上下文 |
+| 获取方式 | 主动查询 | 自动注入 |
+| 维护方式 | 人工编写 | 自动捕获 |
+| 知识形态 | 散落的金子（孤立、过时） | 生长的枝干（互联、进化） |
+
+## 相关概念
+
+- **多 Agent 工作流**: Knowledge Bank 的多阶段知识采集机制本质上是一种 agent 工作流
+- **持久化知识编译**: 与 Karpathy 的 LLM Wiki 模式形成互补——Knowledge Bank 侧重 AI 辅助开发场景的自动化知识捕获，llm-wiki 侧重持久化知识编译
+- [[computerized-adaptive-testing]] — CAT 的自适应选题本质上是知识注入的精准化：在正确的时间向正确的对象注入正确的测试项，与 Knowledge Bank 的情境化检索有相同的设计哲学
+
+## 开放问题
+
+- Knowledge Bank 的三维分类体系是否可扩展到非代码领域（如科研、写作）？
+- AI 观察的"最高可信度"假设在代码存在 anti-pattern 时是否仍然成立？
+- 知识去重的相似度阈值（0.85 合并 / 0.60 提示）是否经过实证验证？
--- a/concepts/kvcache-transfer.md
+++ b/concepts/kvcache-transfer.md
@@ -0,0 +1,38 @@
+---
+title: "KVCache 传输与优化"
+created: 2026-04-19
+updated: 2026-04-19
+type: concept
+tags: [inference, system-design, performance]
+sources: [raw/papers/qin-prfaas-cross-datacenter-2026.md]
+---
+
+# KVCache 传输与优化 (KVCache Transfer)
+
+## 定义
+
+KVCache 是 LLM 推理过程中缓存的 Key-Value 状态，用于避免重复计算。KVCache 传输指在分离式推理架构中将 prefill 阶段生成的 KVCache 移动到 decode 节点的过程。
+
+## 传输瓶颈
+
+- **体积巨大**：Dense-attention 模型的 KVCache 大小与序列长度和模型参数量成正比
+- **带宽要求**：传统架构依赖 RDMA 等低延迟高带宽网络
+- **延迟敏感**：传输延迟直接影响 TTFT（Time to First Token）
+
+## 优化方向
+
+### 模型侧
+- **混合注意力架构**：通过结构化状态空间或线性注意力减少 KVCache 大小
+- **KVCache 压缩**：量化、稀疏化或蒸馏技术
+- **前缀缓存共享**：多请求共享公共前缀的 KVCache
+
+### 系统侧
+- **选择性传输**：仅传输必要的 KVCache 层或 token
+- **带宽感知调度**：根据网络状态动态调整传输策略
+- **PrfaaS 架构**：结合模型效率与系统调度，实现跨数据中心传输
+
+## 相关概念
+
+- [[prefill-as-a-service]] — PrfaaS 架构中的 KVCache 传输
+- [[prefill-decode-disaggregation]] — PD 分离架构
+- [[inference-optimization]] — 推理优化技术
--- a/concepts/memory-caching-rnn.md
+++ b/concepts/memory-caching-rnn.md
@@ -0,0 +1,54 @@
+---
+title: "Memory Caching (MC)"
+created: 2026-04-19
+updated: 2026-04-19
+type: concept
+tags: [architecture, deep-learning, llm]
+sources: [raw/papers/behrouz-memory-caching-rnn-2026.md]
+---
+
+# Memory Caching (MC)
+
+**提出者:** Behrouz et al. (2026) · arXiv:2602.24281
+
+## 定义
+
+Memory Caching 是一种增强循环神经网络（RNN）的技术，通过缓存其隐藏状态的检查点（checkpoints），使 RNN 的有效记忆容量能够随序列长度动态增长。
+
+## 动机
+
+Transformer 成为序列建模范式的主要原因是其**记忆容量随上下文长度增长**的特性，这使得检索任务表现优异。然而，这也带来了 $O(L^2)$ 的二次复杂度。近年来研究者探索了次二次复杂度的 RNN 替代方案，但 RNN 在回忆密集型任务中表现不佳，通常归因于其**固定大小的记忆**限制。
+
+## 技术原理
+
+MC 的核心思想：在 RNN 前向传播过程中，定期保存隐藏状态的快照。当需要回忆历史信息时，可以从这些缓存的检查点恢复，而不是仅依赖当前隐藏状态。
+
+### 四种变体
+
+1. **基础 MC** — 均匀间隔缓存
+2. **门控聚合 MC** — 使用门控机制选择性地缓存重要状态
+3. **稀疏选择 MC** — 稀疏化缓存策略
+4. **深层 MC** — 应用于深层记忆模块
+
+### 复杂度插值
+
+MC 提供了一个可调节的超参数，控制缓存频率，从而在 $O(L)$（传统 RNN）和 $O(L^2)$（Transformer）之间实现灵活插值：
+- 缓存频率 = 0 → 等价于标准 RNN
+- 缓存频率 = 1 → 每步都缓存，接近 Transformer 的记忆能力
+
+## 实验结果
+
+- **语言建模**：MC 提升 RNN 性能
+- **长上下文理解**：MC 变体表现接近 Transformer
+- **上下文回忆任务**：优于 SOTA RNN，接近 Transformer
+
+## 开放问题
+
+- 缓存检查点的最优策略是什么？
+- MC 与其他次二次架构（Mamba、RWKV）的结合效果如何？
+- 在实际部署中，缓存带来的内存开销与性能增益的最佳平衡点在哪里？
+
+## 相关概念
+
+- [[behrouz-memory-caching-rnn]] — 原始论文笔记
+- [[subquadratic-transformer-alternatives]] — 次二次 Transformer 替代方案
--- a/concepts/mixture-of-depths-attention.md
+++ b/concepts/mixture-of-depths-attention.md
@@ -0,0 +1,59 @@
+---
+title: "Mixture-of-Depths Attention (MoDA)"
+created: 2026-04-19
+updated: 2026-04-19
+type: concept
+tags: [architecture, deep-learning, transformer]
+sources: [raw/papers/zhu-moda-mixture-of-depths-2026.md]
+---
+
+# Mixture-of-Depths Attention (MoDA)
+
+**提出者:** Zhu et al. (2026) · arXiv:2603.15619
+
+## 定义
+
+MoDA 是一种改进的注意力机制，旨在解决深层 Transformer 模型中的**信号退化**问题。它允许每个注意力头在计算注意力时，不仅关注当前层的序列 KV，还能直接访问前序若干层的深度 KV，形成跨层的信息通路。
+
+## 动机：信号退化 (Signal Degradation)
+
+在标准 Transformer 中，信息通过残差连接逐层传递。随着网络深度增加：
+- 浅层提取的精细特征在多次残差更新中被逐渐"稀释"
+- 深层网络难以有效利用浅层形成的关键信息
+- 简单的残差连接不足以保留所有重要特征
+
+## 机制设计
+
+### 核心思想
+每个注意力头的查询 $Q$ 不仅与当前层的 $K, V$ 计算注意力，还与前序 $D$ 层的 $K, V$ 计算注意力：
+$$\text{MoDA}(Q_l) = \text{Softmax}\left(\frac{Q_l [K_{l-D:l}]^T}{\sqrt{d}}\right) V_{l-D:l}$$
+
+### 硬件高效实现
+- **挑战**：跨层 KV 访问导致非连续内存访问，降低 GPU 利用率
+- **解决方案**：设计专门的内存访问算法，重组 KV 缓存布局
+- **性能**：在 64K 序列长度下达到 FlashAttention-2 的 97.3% 效率
+
+## 实验表现
+
+| 指标 | 基线 | MoDA | 提升 |
+|------|------|------|------|
+| 平均困惑度 (10 benchmarks) | - | -0.2 | ✓ |
+| 下游任务性能 (10 tasks) | - | +2.11% | ✓ |
+| FLOPs 开销 | 1.0x | 1.037x | +3.7% |
+
+## 归一化位置
+
+- **Post-Norm** + MoDA > **Pre-Norm** + MoDA
+- 这与标准 Transformer 的常见实践（Pre-Norm 更稳定）不同，表明 MoDA 改变了梯度流动特性
+
+## 开放问题
+
+- MoDA 与混合注意力架构的结合效果？
+- 在超大规模模型（>100B）上的扩展性如何？
+- 是否可以与 [[memory-caching-rnn]] 等技术结合？
+
+## 相关概念
+
+- [[zhu-moda-mixture-of-depths]] — 原始论文
+- [[depth-scaling-llms]] — LLM 深度扩展
+- [[signal-degradation]] — 信号退化问题
--- a/concepts/prefill-as-a-service.md
+++ b/concepts/prefill-as-a-service.md
@@ -0,0 +1,59 @@
+---
+title: "Prefill-as-a-Service (PrfaaS)"
+created: 2026-04-19
+updated: 2026-04-19
+type: concept
+tags: [inference, system-design, architecture]
+sources: [raw/papers/qin-prfaas-cross-datacenter-2026.md]
+---
+
+# Prefill-as-a-Service (PrfaaS)
+
+**提出者:** Qin et al. (2026) · arXiv:2604.15039
+
+## 定义
+
+PrfaaS 是一种跨数据中心的 LLM 服务架构，通过选择性地将长上下文 prefill 卸载到独立的计算密集型集群，并通过商用以太网将 KVCache 传输到本地 decode 集群，实现 prefill 和 decode 容量的独立扩展。
+
+## 动机
+
+传统的 [[prefill-decode-disaggregation]] 架构虽然分离了计算密集型的 prefill 和内存密集型的 decode 阶段，但受限于 KVCache 的传输成本：
+- **Dense-attention 模型**：KVCache 体积巨大，需要低延迟 RDMA 网络
+- **混合注意力模型**：KVCache 大幅减小，但真实负载特性（突发、长度偏斜、带宽波动）仍使简单的外部化设计面临拥塞和低利用率问题
+
+## 架构设计
+
+### 核心组件
+1. **独立 Prefill 集群**：计算密集型，专门处理长上下文 prefill
+2. **本地 PD 集群**：接收 KVCache 后执行 decode
+3. **带宽感知调度器**：根据跨数据中心带宽波动动态调整卸载策略
+4. **缓存感知请求放置**：利用现有前缀缓存优化请求路由
+
+### 关键技术
+- **选择性卸载**：仅对长上下文请求进行跨数据中心 prefill 卸载
+- **KVCache 高效传输**：通过商用以太网（无需 RDMA）传输
+- **系统侧与模型侧协同**：结合模型 KV 效率优化与系统调度
+
+## 性能表现
+
+基于内部 1T 参数混合模型：
+- 吞吐量比同构 PD 部署高 **54%**
+- 吞吐量比朴素异构基线高 **32%**
+- 跨数据中心带宽消耗适度
+
+## 意义
+
+PrfaaS 解除了"异构加速器必须共享同一低延迟 RDMA fabric"的限制，使得 LLM 服务可以更灵活地部署在松散耦合的集群中，为云原生 LLM 服务提供了新的架构范式。
+
+## 开放问题
+
+- 如何自适应选择预填卸载的阈值？
+- PrfaaS 在多租户环境下的隔离与调度策略？
+- 对纯 dense-attention 模型的适用性边界？
+
+## 相关概念
+
+- [[qin-prfaas-cross-datacenter]] — 原始论文
+- [[prefill-decode-disaggregation]] — PD 分离架构
+- [[kvcache-transfer]] — KVCache 传输优化
+- [[hybrid-attention-models]] — 混合注意力架构
--- a/concepts/prefill-decode-disaggregation.md
+++ b/concepts/prefill-decode-disaggregation.md
@@ -0,0 +1,38 @@
+---
+title: "Prefill-Decode 分离架构 (PD Disaggregation)"
+created: 2026-04-19
+updated: 2026-04-19
+type: concept
+tags: [inference, system-design, architecture]
+sources: [raw/papers/qin-prfaas-cross-datacenter-2026.md]
+---
+
+# Prefill-Decode 分离架构 (PD Disaggregation)
+
+## 定义
+
+将 LLM 推理的两个主要阶段——**Prefill**（处理 prompt，计算密集型）和 **Decode**（自回归生成 token，内存密集型）——分离到不同的硬件或集群上执行，以优化资源利用率。
+
+## 演进背景
+
+1. **同构部署**：Prefill 和 Decode 在同一 GPU 上执行，资源利用率低
+2. **PD 分离**：将两者分离，分别优化计算和内存资源
+3. **跨数据中心 PD 分离**：PrfaaS 架构进一步打破网络域限制，实现跨数据中心的资源弹性
+
+## 核心挑战
+
+- **KVCache 传输成本**：Dense-attention 模型产生巨大的 KVCache，需要高带宽低延迟网络（RDMA）
+- **负载不均衡**：Prefill 和 Decode 的峰值时间不同，但传统架构受限于网络拓扑
+- **异构部署困难**：不同代际或类型的加速器难以在同一网络域内协同
+
+## 最新进展
+
+- **混合注意力架构**（如 Hyena、基于状态空间的模型）大幅减少 KVCache 大小
+- **PrfaaS** (Qin et al., 2026)：结合模型侧 KV 效率与系统侧选择性卸载，实现跨数据中心 PD 分离
+- **商用以太网替代 RDMA**：降低部署成本和复杂性
+
+## 相关概念
+
+- [[prefill-as-a-service]] — PrfaaS 架构
+- [[kvcache-transfer]] — KVCache 传输优化
+- [[hybrid-attention-models]] — 混合注意力架构
--- a/concepts/subquadratic-transformer-alternatives.md
+++ b/concepts/subquadratic-transformer-alternatives.md
@@ -0,0 +1,49 @@
+---
+title: "次二次 Transformer 替代方案"
+created: 2026-04-19
+updated: 2026-04-19
+type: concept
+tags: [architecture, deep-learning, llm]
+sources: [raw/papers/behrouz-memory-caching-rnn-2026.md]
+---
+
+# 次二次 Transformer 替代方案 (Subquadratic Transformer Alternatives)
+
+## 问题定义
+
+Transformer 的核心瓶颈在于自注意力机制的 $O(L^2)$ 计算和内存复杂度，限制了其在长序列上的应用。近年来涌现了多种次二次复杂度的替代架构。
+
+## 主要方向
+
+### RNN 类
+- **传统 RNN/LSTM/GRU** — $O(L)$ 复杂度，但固定记忆限制回忆能力
+- **Memory Caching (MC)** — 通过缓存检查点扩展 RNN 记忆 [[memory-caching-rnn]]
+- **Mamba/State Space Models** — 结构化状态空间，$O(L)$ 复杂度
+- **RWKV** — 结合 Transformer 和 RNN 优势
+
+### 线性注意力
+- **Linear Transformers** — 通过核方法将注意力线性化
+- **Performer** — 使用随机特征近似的线性注意力
+
+### 其他
+- **Hyena** — 基于长卷积的序列模型
+- **Griffin** — 门控卷积与线性注意力的混合
+
+## 核心权衡
+
+| 架构类型 | 复杂度 | 记忆能力 | 并行训练 |
+|----------|--------|----------|----------|
+| Transformer | $O(L^2)$ | ★★★★★ | ✓ |
+| MC-RNN | $O(L)$~$O(L^2)$ | ★★★★ | ✗ |
+| SSM/Mamba | $O(L)$ | ★★★☆ | 部分 |
+| Linear Attn | $O(L)$ | ★★★ | ✓ |
+
+## 开放问题
+
+- 是否存在一种架构能同时实现 $O(L)$ 复杂度和 Transformer 级别的回忆能力？
+- Memory Caching 是否可推广到其他次二次架构？
+
+## 相关概念
+
+- [[memory-caching-rnn]] — Memory Caching 技术
+- [[behrouz-memory-caching-rnn]] — MC 原始论文
--- a/concepts/symbolic-regression.md
+++ b/concepts/symbolic-regression.md
@@ -0,0 +1,100 @@
+---
+title: "Symbolic Regression"
+created: 2026-04-16
+updated: 2026-04-17
+type: concept
+tags: [optimization, training, model]
+sources: [raw/papers/odrzywolek-eml-universal-operator-2026.md]
+---
+
+# Symbolic Regression
+
+**Symbolic regression** is a machine learning technique that discovers explicit mathematical expressions from data, rather than fitting fixed-form models. Unlike traditional regression (which optimizes parameters within a predetermined functional form), symbolic regression searches the space of possible equation structures.
+
+## Core Problem
+
+Given data points (xᵢ, yᵢ), find a closed-form expression f such that y ≈ f(x), where f is composed of elementary operations and functions.
+
+**Key Distinction:**
+- Traditional regression: y = β₀ + β₁x + β₂x² (form fixed, optimize β)
+- Symbolic regression: Discover that y = sin(2πx) · e^(-x²) from data
+
+## Traditional Approaches
+
+### Genetic Programming
+
+The dominant approach historically:
+- **Representation**: Expression trees with heterogeneous nodes (+, -, ×, ÷, sin, exp, etc.)
+- **Search**: Evolutionary algorithms (mutations, crossovers)
+- **Fitness**: Mean squared error or complexity-penalized metrics
+- **Tools**: Eureqa, gplearn, PySR
+
+**Limitations:**
+- Discrete search space (combinatorial explosion)
+- Slow convergence for complex expressions
+- No gradient information
+- Brittle to hyperparameters
+
+### Sparse Regression (SINDy)
+
+- Assumes sparse linear combination from a library of candidate functions
+- Uses LASSO/sparse optimization
+- Faster but limited to linear combinations of basis functions
+
+## Gradient-Based Approaches
+
+Recent work enables differentiable symbolic regression:
+
+### EML Trees (2026)
+
+[[eml-universal-operator|Odrzywołek's EML representation]] enables gradient-based optimization:
+- Uniform tree structure (all nodes are `eml` operators)
+- Fully differentiable
+- Optimizable with standard deep learning optimizers (Adam)
+- Can recover exact closed forms at shallow depths (≤4)
+
+### Neural Symbolic Methods
+
+- **AI Feynman**: Combines neural network fitting with symbolic property testing
+- **Symbolic GPT**: Transformer-based generation of expressions
+- **Deep Symbolic Regression**: Neural networks predicting expression trees
+
+## Evaluation Metrics
+
+1. **Accuracy**: R², MSE, NMSE on held-out data
+2. **Complexity**: Number of nodes, operators, or description length
+3. **Pareto Frontier**: Trade-off between accuracy and simplicity
+4. **Exact Recovery**: Whether the true underlying formula is found
+5. **Generalization**: Performance on out-of-distribution data
+
+## Applications
+
+| Domain | Example |
+|--------|---------|
+| Physics | Discovering force laws, equations of state |
+| Chemistry | Reaction kinetics, structure-property relationships |
+| Biology | Population dynamics, gene regulatory networks |
+| Engineering | System identification, control laws |
+| Finance | Discovering pricing formulas, risk models |
+
+## Challenges
+
+1. **Scalability**: Exponential growth of expression space with size
+2. **Noise Sensitivity**: Overfitting to data noise
+3. **Non-uniqueness**: Multiple expressions may fit data equally well
+4. **Dimensional Analysis**: Incorporating physical units/constraints
+5. **Interpretability**: Balancing accuracy with human-understandable forms
+
+## Future Directions
+
+- Integration with large language models for prior knowledge
+- Physics-informed constraints (conservation laws, symmetries)
+- Multi-objective optimization (accuracy, simplicity, generalization)
+- Real-time/online symbolic regression
+- Human-in-the-loop discovery workflows
+
+## Related Concepts
+
+- [[eml-universal-operator]]: A universal operator enabling gradient-based symbolic regression
+- [[andrzej-odrzywolek]]: Researcher who discovered the EML universal operator
+- [[computerized-adaptive-testing]]: CAT 中的动态选题策略与符号回归中的自适应搜索在"探索-利用权衡"上有结构相似性