20260617:目前有914 页
This commit is contained in:
48
concepts/sparse-autoencoder.md
Normal file
48
concepts/sparse-autoencoder.md
Normal file
@@ -0,0 +1,48 @@
|
||||
---
|
||||
title: "稀疏自编码器 (Sparse Autoencoder)"
|
||||
created: 2026-06-17
|
||||
updated: 2026-06-17
|
||||
type: concept
|
||||
tags: [interpretability, architecture, dictionary-learning, sparse-coding]
|
||||
sources: [raw/papers/zhang-geometric-sae-2026.md]
|
||||
confidence: high
|
||||
---
|
||||
|
||||
# 稀疏自编码器 (Sparse Autoencoder)
|
||||
|
||||
SAE 是**机制可解释性的核心工具**——通过学过完备稀疏表征将神经网络激活分解为可解释特征。
|
||||
|
||||
## 基本结构
|
||||
|
||||
```
|
||||
z = W_enc (x - b_pre) + b_enc # 编码:从 n 维激活映射到 d 维 (d >> n)
|
||||
a = Act(z) # 稀疏激活
|
||||
x̂ = W_dec a + b_dec # 解码:重构原始激活
|
||||
```
|
||||
|
||||
## 主要变体
|
||||
|
||||
[[geometric-sae-concepts|Zhang et al. (2026)]] 将 SAE 分为两类:
|
||||
|
||||
### [[absolute-gating|绝对门控]]
|
||||
每个神经元激活独立于其他:
|
||||
- **ReLU SAE**:`L = ‖x - x̂‖² + λ‖a‖₁`,L1 正则化强制稀疏
|
||||
- **Gated SAE**:引入门控机制提高选择性
|
||||
- **JumpReLU SAE**:使用跳跃 ReLU 激活
|
||||
|
||||
### [[absolute-gating|相对门控]]
|
||||
神经元激活依赖于其他神经元(竞争选择):
|
||||
- **Top-K SAE**:仅保留 k 个最大激活,其余归零
|
||||
- **Matching Pursuit SAE**:迭代选择最有贡献的神经元
|
||||
- **SPaDE**:结构化稀疏分解
|
||||
|
||||
## 核心理念
|
||||
|
||||
SAE 的基础假设是[[linear-representation-hypothesis|线性表征假设]]:语义概念对应于激活空间中的方向并可线性组合。SAE 通过稀疏性强制将这些方向解耦,使单个神经元趋向[[polysemanticity|单义性]]。
|
||||
|
||||
## 参考
|
||||
|
||||
- [[polysemanticity|多义性/单义性]]
|
||||
- [[mechanistic-interpretability|机制可解释性]]
|
||||
- [[linear-representation-hypothesis|线性表征假设]]
|
||||
- [[geometric-sae-concepts|几何框架论文]]
|
||||
Reference in New Issue
Block a user