Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality, Tri Dao+, ICML'24 #1831

Open
AkihikoWatanabe opened this issue Mar 24, 2025 · 1 comment

Comments

@AkihikoWatanabe
Copy link
Owner

AkihikoWatanabe commented Mar 24, 2025

URL

Authors

  • Tri Dao
  • Albert Gu

Abstract

  • While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale. We show that these families of models are actually quite closely related, and develop a rich framework of theoretical connections between SSMs and variants of attention, connected through various decompositions of a well-studied class of structured semiseparable matrices. Our state space duality (SSD) framework allows us to design a new architecture (Mamba-2) whose core layer is an a refinement of Mamba's selective SSM that is 2-8X faster, while continuing to be competitive with Transformers on language modeling.

Translation (by gpt-4o-mini)

  • Transformersは、言語モデルにおける深層学習の成功の背後にある主要なアーキテクチャである一方で、Mambaのような状態空間モデル(SSMs)は、最近、小規模から中規模においてTransformersと同等かそれ以上の性能を示すことが明らかになっている。私たちは、これらのモデル群が実際には非常に密接に関連していることを示し、構造化された半分可分行列のよく研究されたクラスのさまざまな分解を通じて、SSMsと注意の変種との間の理論的な接続の豊かな枠組みを発展させた。私たちの状態空間双対(SSD)フレームワークにより、Mambaの選択的SSMを改良した新しいアーキテクチャ(Mamba-2)を設計することが可能となり、これは2〜8倍の速度向上を実現しつつ、言語モデルにおいてTransformersと競争力を保つことができる。

Summary (by gpt-4o-mini)

  • TransformersとMambaのような状態空間モデル(SSMs)の関連性を示し、SSMsと注意の変種との理論的接続を構築。新たに設計したMamba-2は、速度を2〜8倍向上させながら、Transformersと競争力を維持。
@AkihikoWatanabe AkihikoWatanabe changed the title Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality, Tri Dao+, arXiv'24 Mar 24, 2025
@AkihikoWatanabe
Copy link
Owner Author

Mamba2の詳細を知りたい場合に読む

@AkihikoWatanabe AkihikoWatanabe changed the title Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality, Tri Dao+, arXiv'24 Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality, Tri Dao+, ICML'24 Mar 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant