Yuan 2.0-M32: Mixture of Experts with Attention Router Paper • 2405.17976 • Published May 28, 2024 • 22
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs Paper • 2503.05139 • Published Mar 7 • 2