You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale. We show that these families of models are actually quite closely related, and develop a rich framework of theoretical connections between SSMs and variants of attention, connected through various decompositions of a well-studied class of structured semiseparable matrices. Our state space duality (SSD) framework allows us to design a new architecture (Mamba-2) whose core layer is an a refinement of Mamba's selective SSM that is 2-8X faster, while continuing to be competitive with Transformers on language modeling.
AkihikoWatanabe
changed the title
あ
Transformers are SSMs: Generalized Models and Efficient Algorithms
Through Structured State Space Duality, Tri Dao+, arXiv'24
Mar 24, 2025
AkihikoWatanabe
changed the title
Transformers are SSMs: Generalized Models and Efficient Algorithms
Through Structured State Space Duality, Tri Dao+, arXiv'24
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality, Tri Dao+, ICML'24
Mar 24, 2025
URL
Authors
Abstract
Translation (by gpt-4o-mini)
Summary (by gpt-4o-mini)
The text was updated successfully, but these errors were encountered: