ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition Paper • 2402.15220 • Published Feb 23 • 19
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection Paper • 2403.19888 • Published Mar 29 • 9
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 104
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model Paper • 2404.09967 • Published Apr 15 • 20
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Paper • 2405.21060 • Published May 31 • 63