SidneyZhang/myWiki

Files

Sidney Zhang 91fac5b6fc

20260617:目前有914 页

2026-06-17 15:02:40 +08:00

2.5 KiB

Raw Blame History

title, created, type

title	created	type
Geometric SAE 论文集成 Review	2026-06-17	review

📌 基本信息

论文：A Geometric View for Understanding Concept Learning and Neuron Interpretation in Sparse Autoencoders
作者：Chenhao Zhang, Chris Lin, Su-In Lee — University of Washington
领域：cs.LG / Mechanistic Interpretability
arXiv：2606.07007v1 (2026-06-05)

🎯 核心概念

sparse-autoencoder — 机制可解释性的核心工具：过完备稀疏字典解耦叠加表征
polysemanticity — 神经元可解释性的核心挑战与目标
concept-learning — detection → separation → approximation，几何条件递进
formal-concept-analysis / concept-lattice — 组织神经元-概念多对多关系的数学框架
absolute-gating — SAE 架构分类决定几何性质

🔗 概念网络

核心链路：

Superposition → Polysemanticity → SAE → Absolute/Relative Gating
    ↓                                    ↓
Linear Rep. Hypothesis    ←→    Hyperplane Arrangements
    ↓                                    ↓
Concept Learning ←→ Formal Concept Analysis → Concept Lattice
    ↓
Feature Splitting / Absorption / Family

与已有知识的关联：通过 linear-representation-hypothesis（已存在）和 superposition（新增）与现有 wiki 概念形成桥梁。这是 wiki 中首个覆盖机制可解释性的论文集成。

📚 Wiki 集成

新增页面：14 个（1 论文 + 12 概念 + 1 raw）
总规模：855 → 868 页（+13，review 不计入）
全新子领域：机制可解释性（mech interp）——此前 wiki 零覆盖

💡 关键洞察

概念 = 集合 是最优雅的起点：放弃"概念 = 方向"的线性假设，将概念直接定义为数据点集合。这一简单抽象使整个 SAE 分析具有几何清晰性——概念学习就是集合对齐、神经元解释就是集合表征。
三层学习层次是工程指南：Detection（覆盖）、Separation（独占）、Approximation（紧致包围）——每一层对应不同的应用场景和几何条件。Theorem 5.8（近似 ↔ 凸性）是限制 SAE 能力的根本瓶颈。
概念格解决了解释的模糊性：FCA 揭示概念学习与神经元解释是不对偶的——两者不必一致。概念格组织多对多关系，避免强行选择"最佳单一匹配"带来的信息损失。