1.9 KiB
title, source_url, ingested, sha256
| title | source_url | ingested | sha256 |
|---|---|---|---|
| A Geometric View for Understanding Concept Learning and Neuron Interpretation in Sparse Autoencoders | https://arxiv.org/abs/2606.07007 | 2026-06-17 | <computed> |
A Geometric View for Understanding Concept Learning and Neuron Interpretation in Sparse Autoencoders
Authors: Chenhao Zhang, Chris Lin, Su-In Lee — University of Washington, Paul G. Allen School of CSE
arXiv: 2606.07007v1 [cs.LG] (2026-06-05)
Published: Preprint, June 8, 2026
Abstract
A unified mathematical framework for geometric understanding of concept learning and neuron interpretation in sparse autoencoders (SAEs). Formalizes concepts as sets of data points and casts concept learning as a set-alignment problem between human-defined and model-induced concepts. Distinguishes three increasingly strong notions of learning — detection, separation, and approximation — and yields geometric conditions, error bounds, and capacity constraints. Provides a set-theoretic account for SAE phenomena including feature splitting, feature absorption, feature families, and hierarchical concepts. Connects concept learning and neuron interpretation through formal concept analysis, showing that the two directions need not agree and their many-to-many structure can be organized by concept lattices.