20260617:目前有914 页
This commit is contained in:
64
concepts/linearized-neural-network.md
Normal file
64
concepts/linearized-neural-network.md
Normal file
@@ -0,0 +1,64 @@
|
||||
---
|
||||
title: "线性化神经网络 (Linearized Neural Network)"
|
||||
created: 2026-06-17
|
||||
updated: 2026-06-17
|
||||
type: concept
|
||||
tags: [deep-learning, theory, neural-networks, ntk]
|
||||
sources: [raw/papers/tiwari-ticks-to-flows-2026.md]
|
||||
confidence: high
|
||||
---
|
||||
|
||||
# 线性化神经网络 (Linearized Neural Network)
|
||||
|
||||
线性化 NN 是将神经网络在**初始参数附近进行一阶 Taylor 展开**的理论工具,是 [[infinite-width-limit|无限宽度理论]] 的核心技术。
|
||||
|
||||
## 形式
|
||||
|
||||
对于两层的 actor 网络 `F(s; W)`,在初始化 `W^0` 附近线性化:
|
||||
|
||||
```
|
||||
F_lin(s; W) = F(s; W^0) + Φ(s; W^0)(W - W^0)
|
||||
```
|
||||
|
||||
其中 `Φ(s; W^0)` 是 Jacobian(tangent features),包含每个隐藏神经元的衍生特征:
|
||||
|
||||
```
|
||||
Φ_κ(s; W^0) = C_κ^0 φ'(W_κ^0 · s) s^T
|
||||
```
|
||||
|
||||
## 关键性质
|
||||
|
||||
1. **W 线性**:输出是参数 W 的线性函数(但非输入 s 的线性函数)
|
||||
2. **特征固定**:tangent features Φ 在训练中不变化 → **lazy regime**
|
||||
3. **高斯输出**:在大宽度下,输出近似服从高斯分布(by CLT)
|
||||
4. **梯度简便**:梯度更新公式大幅简化
|
||||
|
||||
## 为什么用线性化
|
||||
|
||||
在 [[ticks-to-flows|Ticks-to-Flows]] 的证明中,线性化使得:
|
||||
|
||||
- 状态 `s̃_{t,τ}` 可以表示为参数 `W^τ - W^0` 的**多项式**(通过 [[ito-calculus|Itô-Taylor 展开]])
|
||||
- 梯度更新公式(Equation 5)在参数空间中闭合
|
||||
- [[martingale-clt|鞅 CLT]] 可应用于条件高斯极限的推导
|
||||
|
||||
## 与 NTK 的关系
|
||||
|
||||
在大宽度下,线性化模型的 kernel 趋近于 [[neural-tangent-kernel|Neural Tangent Kernel (NTK)]]:
|
||||
|
||||
```
|
||||
K(s, s') = E[Φ(s; W^0) · Φ(s'; W^0)]
|
||||
```
|
||||
|
||||
NTK 描述了参数梯度之间的点积,决定了训练的动力学。
|
||||
|
||||
## 局限性
|
||||
|
||||
- Lazy training:特征不演化,限制了表征学习
|
||||
- 需要 `η = O(1/sqrt(n))` 的小学习率
|
||||
- 实际应用中不完全成立(特征学习是深度学习的关键优势)
|
||||
|
||||
## 参考
|
||||
|
||||
- [[neural-tangent-kernel|NTK]]
|
||||
- [[infinite-width-limit|无限宽度极限]]
|
||||
- [[ticks-to-flows|Ticks to Flows]]
|
||||
Reference in New Issue
Block a user