Files
myWiki/concepts/edge-of-stability.md

42 lines
2.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Edge of Stability (EoS)"
created: 2026-06-23
updated: 2026-06-23
type: concept
tags: [optimization, gradient-descent, deep-learning, sharpness, bifurcation]
sources: [gan-bifurcation-eos]
---
# Edge of Stability (EoS)
Edge of Stability (EoS) 是深度学习中梯度下降训练的一个反直觉现象:模型在 **sharpness λ 超过经典收敛阈值 2/η** 的情况下仍能稳定训练loss 非单调但长期下降。该现象由 Cohen et al. (2022) 首次系统实证记录。
## 核心机制
经典梯度下降分析要求学习率 η 与 sharpness λHessian 最大特征值)满足 **ηλ < 2** 才能保证收敛。但在实践中,深度网络训练时 sharpness 会上升至超过该阈值loss 出现振荡,却仍能长期收敛。这种"在稳定边缘运行"的行为无法用经典凸优化理论解释。
EoS 的典型动力学阶段:
1. **渐进锐化 (Progressive Sharpening)**:训练初期 sharpness 单调上升,穿过 2/η 阈值进入 EoS 状态
2. **自稳定 (Self-Stabilization)**sharpness 在阈值附近振荡loss 非单调但呈下降趋势
3. **最终收敛**sharpness 回落至阈值以下,迭代收敛到极小值流形
## 理论解释谱系
- **三阶自稳定** (Damian et al., 2023)loss Taylor 展开的三阶项贡献 sharpness 自稳定
- **多尺度损失结构** (Ma et al., 2022):次二次性质阻止发散
- **极简分析** (Zhu et al., Wang et al., Song & Yun, Gan 2026):在低维结构化损失上严格证明 EoS 收敛
- **分岔理论框架** (Gan 2026b, [[gan-bifurcation-eos|本文]]):将 EoS 稳定性归结为 flip 分岔的 Lyapunov 系数符号
## 与过参数化的关联
过参数化网络存在 [[manifold-of-minimizers|极小值流形]]Hessian 秩亏。EoS 动力学可分解为流形法向的周期振荡和切向的 sharpness 下降漂移——两者的协同作用产生收敛。
## 参考
- Cohen et al. (2022). Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability.
- Damian et al. (2023). Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability.
- [[gan-bifurcation-eos|Gan (2026b) — 分岔理论框架]]
- [[product-stability|Gan (2026) — 乘积稳定性]]
- [[flip-bifurcation]]
- [[first-lyapunov-coefficient]]