20260601
This commit is contained in:
46
concepts/bayesian-attention-trilogy.md
Normal file
46
concepts/bayesian-attention-trilogy.md
Normal file
@@ -0,0 +1,46 @@
|
||||
---
|
||||
title: "Bayesian Attention Trilogy"
|
||||
created: 2026-05-26
|
||||
type: concept
|
||||
tags: ["bayesian-inference", "transformers", "research-program"]
|
||||
sources: ["agarwal-bayesian-attention-geometry"]
|
||||
---
|
||||
|
||||
# Bayesian Attention Trilogy
|
||||
|
||||
> 三篇论文构成的统一论证:Transformer 的贝叶斯推理——从存在性到涌现机制到组合扩展。
|
||||
|
||||
## 三部曲结构
|
||||
|
||||
### Paper I: The Bayesian Geometry of Transformer Attention
|
||||
- **角色**:Lemma 1 — 建立**存在性**
|
||||
- **内容**:在 [[bayesian-wind-tunnels|Bayesian wind tunnel]] 中证明小型 Transformer 实现精确贝叶斯后验
|
||||
- **发现**:[[inference-primitives|三原语]]体系 + [[bayesian-attention-geometry|几何诊断]]
|
||||
|
||||
### Paper II: Gradient Dynamics
|
||||
- **角色**:解释**为什么**
|
||||
- **内容**:贝叶斯结构从交叉熵梯度动力学中**自然涌现**
|
||||
- **论证**:不是巧合,而是训练的必然收敛结果
|
||||
|
||||
### Paper III: Composition in Partially Observed Settings
|
||||
- **角色**:展示**扩展性**
|
||||
- **内容**:原语在部分可观测环境(更接近自然语言)中如何**组合**
|
||||
- **论证**:简单原语的组合产生复杂推理行为
|
||||
|
||||
## 统一论证
|
||||
|
||||
```
|
||||
Paper I: Transformer 能做到精确贝叶斯推理吗? → 是(存在性)
|
||||
Paper II: 这是巧合还是必然? → 必然(涌现机制)
|
||||
Paper III: 这些能力能扩展到真实场景吗? → 能(组合扩展)
|
||||
```
|
||||
|
||||
## 方法论价值
|
||||
|
||||
三部曲展示了从**可验证的受控实验**(Paper I)到**理论解释**(Paper II)再到**向真实场景推广**(Paper III)的完整研究范式。这与 [[bayesian-wind-tunnels|wind tunnel]] 方法论一致——先在可控环境中建立基本事实,再逐步增加复杂度。
|
||||
|
||||
## 相关页面
|
||||
|
||||
- [[agarwal-bayesian-attention-geometry]] — Paper I 详情
|
||||
- [[bayesian-wind-tunnels]] — 核心实验方法
|
||||
- [[inference-primitives]] — 贯穿三部曲的理论框架
|
||||
Reference in New Issue
Block a user