20260625:很多新内容
This commit is contained in:
33
raw/papers/dao-transformers-are-ssms-2024.md
Normal file
33
raw/papers/dao-transformers-are-ssms-2024.md
Normal file
@@ -0,0 +1,33 @@
|
||||
---
|
||||
title: "Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality"
|
||||
source: arXiv
|
||||
source_id: 2405.21060
|
||||
authors:
|
||||
- Tri Dao (Princeton University)
|
||||
- Albert Gu (Carnegie Mellon University)
|
||||
published: 2024-05-31
|
||||
venue: ICML 2024
|
||||
categories:
|
||||
- cs.LG
|
||||
---
|
||||
|
||||
# Transformers are SSMs
|
||||
|
||||
## Abstract
|
||||
While Transformers dominate language modeling, state-space models (SSMs) such as Mamba have matched or outperformed them at small-to-medium scale. This paper shows these model families are closely related through **structured state space duality (SSD)**, connected via **semiseparable matrices**. The SSD framework enables Mamba-2, a refined selective SSM that is 2-8x faster than Mamba while competitive with Transformers.
|
||||
|
||||
## Core Contributions
|
||||
1. **SSD Framework**: Equivalence between SSMs and semiseparable matrices → connects SSM recurrence with attention-like quadratic forms
|
||||
2. **Structured Masked Attention (SMA)**: Generalizes linear attention with data-dependent position masks
|
||||
3. **SSD Algorithm**: Block decomposition of semiseparable matrices, leveraging both linear (recurrent) and quadratic (attention-like) forms
|
||||
4. **Mamba-2 Architecture**: Multi-head SSM design with tensor parallelism support
|
||||
5. **Systems Optimizations**: TP, sequence parallelism, variable-length training
|
||||
|
||||
## Key Concepts
|
||||
- Structured State Space Duality (SSD), Semiseparable Matrices
|
||||
- Structured Masked Attention (SMA), Linear Attention
|
||||
- Selective SSMs, Scalar SSM, Head Structure for SSMs (MIS/MVA/GVA)
|
||||
- SSD Algorithm, Block Decomposition, Tensor Contraction Duality
|
||||
|
||||
## URL
|
||||
https://arxiv.org/abs/2405.21060
|
||||
Reference in New Issue
Block a user