Files
myWiki/concepts/bayesian-attention-trilogy.md
2026-06-01 10:46:01 +08:00

47 lines
1.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Bayesian Attention Trilogy"
created: 2026-05-26
type: concept
tags: ["bayesian-inference", "transformers", "research-program"]
sources: ["agarwal-bayesian-attention-geometry"]
---
# Bayesian Attention Trilogy
> 三篇论文构成的统一论证Transformer 的贝叶斯推理——从存在性到涌现机制到组合扩展。
## 三部曲结构
### Paper I: The Bayesian Geometry of Transformer Attention
- **角色**Lemma 1 — 建立**存在性**
- **内容**:在 [[bayesian-wind-tunnels|Bayesian wind tunnel]] 中证明小型 Transformer 实现精确贝叶斯后验
- **发现**[[inference-primitives|三原语]]体系 + [[bayesian-attention-geometry|几何诊断]]
### Paper II: Gradient Dynamics
- **角色**:解释**为什么**
- **内容**:贝叶斯结构从交叉熵梯度动力学中**自然涌现**
- **论证**:不是巧合,而是训练的必然收敛结果
### Paper III: Composition in Partially Observed Settings
- **角色**:展示**扩展性**
- **内容**:原语在部分可观测环境(更接近自然语言)中如何**组合**
- **论证**:简单原语的组合产生复杂推理行为
## 统一论证
```
Paper I: Transformer 能做到精确贝叶斯推理吗? → 是(存在性)
Paper II: 这是巧合还是必然? → 必然(涌现机制)
Paper III: 这些能力能扩展到真实场景吗? → 能(组合扩展)
```
## 方法论价值
三部曲展示了从**可验证的受控实验**Paper I到**理论解释**Paper II再到**向真实场景推广**Paper III的完整研究范式。这与 [[bayesian-wind-tunnels|wind tunnel]] 方法论一致——先在可控环境中建立基本事实,再逐步增加复杂度。
## 相关页面
- [[agarwal-bayesian-attention-geometry]] — Paper I 详情
- [[bayesian-wind-tunnels]] — 核心实验方法
- [[inference-primitives]] — 贯穿三部曲的理论框架