Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity Paper • 2501.16295 • Published 6 days ago • 6
Learning to (Learn at Test Time): RNNs with Expressive Hidden States Paper • 2407.04620 • Published Jul 5, 2024 • 28
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression Paper • 2406.14909 • Published Jun 21, 2024 • 15