54 lines
1.4 KiB
Markdown
54 lines
1.4 KiB
Markdown
---
|
||
title: "泛化界 (Generalization Bounds)"
|
||
created: 2026-06-17
|
||
updated: 2026-06-17
|
||
type: concept
|
||
tags: [theory, generalization, learning-theory]
|
||
sources: [raw/papers/ortega-phd-thesis-2026.md]
|
||
confidence: high
|
||
---
|
||
|
||
# 泛化界 (Generalization Bounds)
|
||
|
||
泛化界是学习理论的中心问题——**量化模型在训练数据外的预期性能**。[ortega-phd-thesis|Ortega (2026)] 通过 [[pac-bayesian-bounds|PAC-Bayesian]] 和大偏差理论提供了统一的泛化界框架。
|
||
|
||
## 基本形式
|
||
|
||
```
|
||
L_test ≤ L_train + complexity_penalty(n, P, δ)
|
||
```
|
||
|
||
其中 complexity_penalty 取决于:
|
||
- 样本量 n
|
||
- 假设空间复杂度(先验 P)
|
||
- 置信度 δ
|
||
|
||
## 经典界的困境
|
||
|
||
传统界(VC 维、Rademacher 复杂度)在深度学习中**失效**:
|
||
- 过参数化模型 VC 维 ≈ ∞ → 界退化为平凡
|
||
- 插值区间(L_train = 0)界无意义
|
||
|
||
## 论文中的突破:PAC-Chernoff 界
|
||
|
||
Ortega 的 PAC-Chernoff 界:
|
||
|
||
- 基于**大偏差理论**(非渐进)
|
||
- 在插值区间仍提供非平凡界
|
||
- 分布依赖(不假设 i.i.d.)
|
||
- 对 [[double-descent|双下降]] 提供定量解释
|
||
|
||
## 三种泛化机制的统一
|
||
|
||
| 机制 | 在界中的体现 |
|
||
|------|------------|
|
||
| 多样性 | 降低方差项 |
|
||
| 光滑性 | 放大率函数(集中更强) |
|
||
| 随机性 | SGD 噪声 → 隐式 KL 正则化 |
|
||
|
||
## 参考
|
||
|
||
- [[pac-bayesian-bounds|PAC-Bayesian 界]]
|
||
- [[double-descent|双下降]]
|
||
- [[ortega-phd-thesis|论文]]
|