Files
myWiki/raw/papers/zhang-geometric-sae-2026.md

31 lines
1.9 KiB
Markdown

---
title: "A Geometric View for Understanding Concept Learning and Neuron Interpretation in Sparse Autoencoders"
source_url: https://arxiv.org/abs/2606.07007
ingested: 2026-06-17
sha256: <computed>
---
# A Geometric View for Understanding Concept Learning and Neuron Interpretation in Sparse Autoencoders
**Authors:** Chenhao Zhang, Chris Lin, Su-In Lee — University of Washington, Paul G. Allen School of CSE
**arXiv:** 2606.07007v1 [cs.LG] (2026-06-05)
**Published:** Preprint, June 8, 2026
## Abstract
A unified mathematical framework for geometric understanding of concept learning and neuron interpretation in sparse autoencoders (SAEs). Formalizes concepts as sets of data points and casts concept learning as a set-alignment problem between human-defined and model-induced concepts. Distinguishes three increasingly strong notions of learning — detection, separation, and approximation — and yields geometric conditions, error bounds, and capacity constraints. Provides a set-theoretic account for SAE phenomena including feature splitting, feature absorption, feature families, and hierarchical concepts. Connects concept learning and neuron interpretation through formal concept analysis, showing that the two directions need not agree and their many-to-many structure can be organized by concept lattices.
## Key Concepts
- [[sparse-autoencoder|稀疏自编码器]] / [[polysemanticity|多义性]]
- [[mechanistic-interpretability|机制可解释性]]
- [[concept-learning|概念学习(几何)]] / [[formal-concept-analysis|形式概念分析]]
- [[feature-splitting|特征分裂]] / [[feature-absorption|特征吸收]] / [[feature-family|特征家族]]
- [[absolute-gating|绝对门控 vs 相对门控]]
- [[hyperplane-arrangements|超平面排列]]
- [[concept-lattice|概念格]]
- [[superposition|叠加]]
- [[linear-representation-hypothesis|线性表征假设]]