Memory Augmented Language Models through Mixture of Word Experts Paper • 2311.10768 • Published Nov 15, 2023 • 16
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts Paper • 2401.04081 • Published Jan 8 • 68
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models Paper • 2401.06066 • Published Jan 11 • 35
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models Paper • 2402.01739 • Published Jan 29 • 26
Yuan 2.0-M32: Mixture of Experts with Attention Router Paper • 2405.17976 • Published 4 days ago • 15