matlok
's Collections
Papers - MoE - Training
updated
Robust Mixture-of-Expert Training for Convolutional Neural Networks
Paper
•
2308.10110
•
Published
•
2
Experts Weights Averaging: A New General Training Scheme for Vision
Transformers
Paper
•
2308.06093
•
Published
•
2
ConstitutionalExperts: Training a Mixture of Principle-based Prompts
Paper
•
2403.04894
•
Published
•
2
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language
Models
Paper
•
2403.03432
•
Published
•
1
Not All Experts are Equal: Efficient Expert Pruning and Skipping for
Mixture-of-Experts Large Language Models
Paper
•
2402.14800
•
Published
•
3
Multilinear Mixture of Experts: Scalable Expert Specialization through
Factorization
Paper
•
2402.12550
•
Published
•
2
Buffer Overflow in Mixture of Experts
Paper
•
2402.05526
•
Published
•
8
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
Paper
•
2211.15841
•
Published
•
7
Outrageously Large Neural Networks: The Sparsely-Gated
Mixture-of-Experts Layer
Paper
•
1701.06538
•
Published
•
4
LocMoE: A Low-overhead MoE for Large Language Model Training
Paper
•
2401.13920
•
Published
•
2
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to
Power Next-Generation AI Scale
Paper
•
2201.05596
•
Published
•
2
Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism
Paper
•
2304.11414
•
Published
•
2
DeepSeekMoE: Towards Ultimate Expert Specialization in
Mixture-of-Experts Language Models
Paper
•
2401.06066
•
Published
•
42
HyperRouter: Towards Efficient Training and Inference of Sparse Mixture
of Experts
Paper
•
2312.07035
•
Published
•
2
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Paper
•
2402.14289
•
Published
•
19
AMEND: A Mixture of Experts Framework for Long-tailed Trajectory
Prediction
Paper
•
2402.08698
•
Published
•
2
Fast Inference of Mixture-of-Experts Language Models with Offloading
Paper
•
2312.17238
•
Published
•
7
Sparse Backpropagation for MoE Training
Paper
•
2310.00811
•
Published
•
2
FedJETs: Efficient Just-In-Time Personalization with Federated Mixture
of Experts
Paper
•
2306.08586
•
Published
•
1
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with
Architecture-Routed Mixture-of-Experts
Paper
•
2306.04845
•
Published
•
4
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper
•
2403.07816
•
Published
•
39
Unified Scaling Laws for Routed Language Models
Paper
•
2202.01169
•
Published
•
2
Paper
•
2407.10671
•
Published
•
155