20260617:目前有914 页
This commit is contained in:
63
papers/dead-directions-geometric-singular-learning.md
Normal file
63
papers/dead-directions-geometric-singular-learning.md
Normal file
@@ -0,0 +1,63 @@
|
||||
---
|
||||
title: "Dead Directions: 几何奇异学习理论"
|
||||
created: 2026-06-10
|
||||
updated: 2026-06-10
|
||||
type: paper
|
||||
tags: ["singular-learning-theory", "information-geometry", "fisher-metric", "deep-learning-theory", "optimization"]
|
||||
sources: ["https://arxiv.org/abs/2606.05957"]
|
||||
---
|
||||
|
||||
# Dead Directions: Geometric Singular Learning
|
||||
|
||||
**Author**: Tejas Pradeep Shirodkar (IIIT Hyderabad)
|
||||
**Venue**: arXiv:2606.05957v1 [cs.LG, stat.ML], 2026 | 139 pages
|
||||
|
||||
## 核心问题
|
||||
|
||||
[[singular-learning-theory|奇异学习理论]](Watanabe)和 [[information-geometry|信息几何]](Amari)研究同一参数空间,但使用几乎不相交的词汇表:
|
||||
- **SLT**:在解析坐标中计算贝叶斯不变量(需要广中平祐消解)
|
||||
- **信息几何**:在原始坐标中工作,假设 Fisher 度量非退化——过参数化模型经常违反此假设
|
||||
|
||||
**鸿沟**:奇异结构的信息存在于 Watanabe 框架中,但不在实践者可用的坐标中。
|
||||
|
||||
## Dead Direction:桥接原语
|
||||
|
||||
**[[dead-direction|Dead Direction]]** 是 Fisher 度量退化方向上的单位向量——同时是 Amari 的"核逼近方向"和 Watanabe 的"解析奇异集的切向量"。
|
||||
|
||||
核心洞察:KL 阶 k 可从方向 Fisher 曲率的衰减率恢复,在原始参数坐标中,无需广中平祐消解。
|
||||
|
||||
## 三大支柱
|
||||
|
||||
### 1. 静态速率(Static Rate)
|
||||
沿 dead direction,方向 Fisher 二次型满足:
|
||||
```
|
||||
u^T F(theta(t)) u = Theta(t^{2(k-1)})
|
||||
```
|
||||
KL 阶 k 直接从 Fisher 特征值的衰减斜率读出。
|
||||
|
||||
### 2. 深度网络 K-FAC 分解
|
||||
多层 K-FAC 将 Fisher 块写为激活侧速率 × 梯度侧速率的乘积,二者互为对偶。实例化到现代网络原语:残差流、层归一化、注意力。
|
||||
|
||||
### 3. Gauge 商定理
|
||||
在 G-不变度量上的梯度流下,速率可传递到商空间 Theta/G:
|
||||
- **SGD** 符合条件(其隐式正则化保持对称性)
|
||||
- **标准 Adam 不符合**
|
||||
- 构造 **[[ddcadam|DDCAdam]]**(Dead-Direction-Calibrated Adam):G-等变的 Adam 族预条件子
|
||||
|
||||
## 实践意义
|
||||
|
||||
**从单个 checkpoint 读出 Watanabe 三元组**:通过一次前向和反向传播计算 (lambda, m, nu),无需后验采样——这对大规模网络的实用 SLT 分析具有突破性意义。
|
||||
|
||||
## 相关概念
|
||||
- [[dead-direction|Dead Direction]]
|
||||
- [[singular-learning-theory|Singular Learning Theory]]
|
||||
- [[information-geometry|Information Geometry]]
|
||||
- [[fisher-information-metric|Fisher Information Metric]]
|
||||
- [[real-log-canonical-threshold|RLCT]]
|
||||
- [[kl-order|KL Order]]
|
||||
- [[watanabe-triple|Watanabe's Triple]]
|
||||
- [[ddcadam|DDCAdam]]
|
||||
|
||||
## 来源
|
||||
- [arXiv](https://arxiv.org/abs/2606.05957)
|
||||
- [原始存档](raw/papers/shirodkar-dead-directions-2026.md)
|
||||
Reference in New Issue
Block a user