20260429:一些新东西
This commit is contained in:
61
concepts/manifold-constrained-hyper-connections.md
Normal file
61
concepts/manifold-constrained-hyper-connections.md
Normal file
@@ -0,0 +1,61 @@
|
||||
---
|
||||
title: "Manifold-Constrained Hyper-Connections (mHC)"
|
||||
domain: "Deep Learning / Network Architecture"
|
||||
tags: [architecture, residual-connections, training-stability, transformer]
|
||||
sources: [[deepseek-v4-million-token-context]]
|
||||
---
|
||||
|
||||
# Manifold-Constrained Hyper-Connections (mHC)
|
||||
|
||||
> **类型**: Concept (Tier 1 — Core)
|
||||
> **来源**: [[deepseek-v4-million-token-context]], Xie et al. (2026)
|
||||
|
||||
## 定义
|
||||
|
||||
mHC(Manifold-Constrained Hyper-Connections)是对标准 Hyper-Connections(HC)的改进,通过将残差映射矩阵约束到 Birkhoff 多面体(双随机矩阵流形),解决深层堆叠时的数值不稳定问题。
|
||||
|
||||
## 核心机制
|
||||
|
||||
### 1. 标准 Hyper-Connections
|
||||
标准 HC 将残差流的宽度从 ℝᵈ 扩展为 ℝⁿʰᶜˣᵈ,引入三个可学习的线性映射:
|
||||
- **输入映射 Aₗ** ∈ ℝ¹ˣⁿʰᶜ:将扩展的残差状态融合为层输入
|
||||
- **残差变换 Bₗ** ∈ ℝⁿʰᶜˣⁿʰᶜ:残差状态的跨流混合
|
||||
- **输出映射 Cₗ** ∈ ℝⁿʰᶜˣ¹:将层输出注入残差流
|
||||
|
||||
更新公式:Xₗ₊₁ = BₗXₗ + CₗFₗ(AₗXₗ)
|
||||
|
||||
### 2. 流形约束
|
||||
mHC 的核心创新是将 Bₗ 约束到双随机矩阵流形 M(Birkhoff polytope):
|
||||
```
|
||||
M = {M ∈ ℝⁿˣⁿ | M1ₙ = 1ₙ, 1ₙᵀM = 1ₙᵀ, M ≥ 0}
|
||||
```
|
||||
这确保谱范数 ||Bₗ||₂ ≤ 1,使得残差变换是**非扩张的**(non-expansive),保障前后向传播的数值稳定性。
|
||||
|
||||
### 3. 动态参数化
|
||||
三个映射参数通过输入动态生成,分解为动态分量和静态分量:
|
||||
- 输入 Xₗ 先经 RMSNorm 归一化
|
||||
- 动态分量由可学习权重矩阵生成
|
||||
- 静态分量由可学习偏置提供
|
||||
- 门控因子 α 初始化为小值
|
||||
|
||||
### 4. 约束施加
|
||||
- Aₗ 和 Cₗ:通过 Sigmoid 确保非负性和有界性
|
||||
- Bₗ:通过 **Sinkhorn-Knopp 算法**(20 次迭代)投影到双随机矩阵流形
|
||||
|
||||
## 与标准 HC 的对比
|
||||
|
||||
| 属性 | Hyper-Connections | mHC |
|
||||
|------|-------------------|-----|
|
||||
| 深层训练 | 数值不稳定 | 稳定 |
|
||||
| 残差变换 | 无约束 | 双随机约束 |
|
||||
| 谱范数 | 无界 | ≤1 |
|
||||
| 适用性 | 浅层 | 深层堆叠 |
|
||||
|
||||
## 相关概念
|
||||
|
||||
- [[muon-optimizer]] — Muon 优化器(mHC 与 Muon 共同提升训练稳定性)
|
||||
- [[depth-scaling-signal-degradation]] — 深度扩展中的信号退化
|
||||
|
||||
---
|
||||
|
||||
*Last Updated: 2026-04-27*
|
||||
Reference in New Issue
Block a user