Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM Paper • 2401.02994 • Published Jan 4 • 49
Repeat After Me: Transformers are Better than State Space Models at Copying Paper • 2402.01032 • Published Feb 1 • 22
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks Paper • 2402.04248 • Published Feb 6 • 30
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Paper • 2405.21060 • Published May 31 • 63
Block Transformer: Global-to-Local Language Modeling for Fast Inference Paper • 2406.02657 • Published Jun 4 • 37
Learning to (Learn at Test Time): RNNs with Expressive Hidden States Paper • 2407.04620 • Published Jul 5 • 27
A Comprehensive Survey of Mamba Architectures for Medical Image Analysis: Classification, Segmentation, Restoration and Beyond Paper • 2410.02362 • Published Oct 3 • 17
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA Paper • 2410.20672 • Published Oct 28 • 6
SambaMixer: State of Health Prediction of Li-ion Batteries using Mamba State Space Models Paper • 2411.00233 • Published Oct 31 • 7
Hymba: A Hybrid-head Architecture for Small Language Models Paper • 2411.13676 • Published Nov 20 • 38
Gated Delta Networks: Improving Mamba2 with Delta Rule Paper • 2412.06464 • Published 13 days ago • 9
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 9 days ago • 68