Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality, Tri Dao+, ICML'24 #1831

AkihikoWatanabe · 2025-03-24T13:04:41Z

URL

https://arxiv.org/abs/2405.21060

Authors

Tri Dao
Albert Gu

Abstract

While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale. We show that these families of models are actually quite closely related, and develop a rich framework of theoretical connections between SSMs and variants of attention, connected through various decompositions of a well-studied class of structured semiseparable matrices. Our state space duality (SSD) framework allows us to design a new architecture (Mamba-2) whose core layer is an a refinement of Mamba's selective SSM that is 2-8X faster, while continuing to be competitive with Transformers on language modeling.

Translation (by gpt-4o-mini)

Transformersは、言語モデルにおける深層学習の成功の背後にある主要なアーキテクチャである一方で、Mambaのような状態空間モデル（SSMs）は、最近、小規模から中規模においてTransformersと同等かそれ以上の性能を示すことが明らかになっている。私たちは、これらのモデル群が実際には非常に密接に関連していることを示し、構造化された半分可分行列のよく研究されたクラスのさまざまな分解を通じて、SSMsと注意の変種との間の理論的な接続の豊かな枠組みを発展させた。私たちの状態空間双対（SSD）フレームワークにより、Mambaの選択的SSMを改良した新しいアーキテクチャ（Mamba-2）を設計することが可能となり、これは2〜8倍の速度向上を実現しつつ、言語モデルにおいてTransformersと競争力を保つことができる。

Summary (by gpt-4o-mini)

TransformersとMambaのような状態空間モデル（SSMs）の関連性を示し、SSMsと注意の変種との理論的接続を構築。新たに設計したMamba-2は、速度を2〜8倍向上させながら、Transformersと競争力を維持。

AkihikoWatanabe · 2025-03-24T13:05:21Z

Mamba2の詳細を知りたい場合に読む

AkihikoWatanabe added Pocket NLP LanguageModel SSM (StateSpaceModel) labels Mar 24, 2025

AkihikoWatanabe changed the title あ Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality, Tri Dao+, arXiv'24 Mar 24, 2025

AkihikoWatanabe added the ICML label Mar 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality, Tri Dao+, ICML'24 #1831

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality, Tri Dao+, ICML'24 #1831

AkihikoWatanabe commented Mar 24, 2025 •

edited

Loading

AkihikoWatanabe commented Mar 24, 2025

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality, Tri Dao+, ICML'24 #1831

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality, Tri Dao+, ICML'24 #1831

Comments

AkihikoWatanabe commented Mar 24, 2025 • edited Loading

URL

Authors

Abstract

Translation (by gpt-4o-mini)

Summary (by gpt-4o-mini)

AkihikoWatanabe commented Mar 24, 2025

AkihikoWatanabe commented Mar 24, 2025 •

edited

Loading