matlok
's Collections
Papers - MoE - Router
updated
Turn Waste into Worth: Rectifying Top-k Router of MoE
Paper
•
2402.12399
•
Published
•
2
CompeteSMoE -- Effective Training of Sparse Mixture of Experts via
Competition
Paper
•
2402.02526
•
Published
•
3
Buffer Overflow in Mixture of Experts
Paper
•
2402.05526
•
Published
•
8
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper
•
2402.01739
•
Published
•
26
LocMoE: A Low-overhead MoE for Large Language Model Training
Paper
•
2401.13920
•
Published
•
2
HyperRouter: Towards Efficient Training and Inference of Sparse Mixture
of Experts
Paper
•
2312.07035
•
Published
•
2
Routers in Vision Mixture of Experts: An Empirical Study
Paper
•
2401.15969
•
Published
•
2
BASE Layers: Simplifying Training of Large, Sparse Models
Paper
•
2103.16716
•
Published
•
3
Hash Layers For Large Sparse Models
Paper
•
2106.04426
•
Published
•
2
Direct Neural Machine Translation with Task-level Mixture of Experts
models
Paper
•
2310.12236
•
Published
•
2
Adaptive Gating in Mixture-of-Experts based Language Models
Paper
•
2310.07188
•
Published
•
2
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its
Routing Policy
Paper
•
2310.01334
•
Published
•
3
Sparse Backpropagation for MoE Training
Paper
•
2310.00811
•
Published
•
2
A Review of Sparse Expert Models in Deep Learning
Paper
•
2209.01667
•
Published
•
3
SpeechMoE2: Mixture-of-Experts Model with Improved Routing
Paper
•
2111.11831
•
Published
•
2
Towards More Effective and Economic Sparsely-Activated Model
Paper
•
2110.07431
•
Published
•
2
Taming Sparsely Activated Transformer with Stochastic Experts
Paper
•
2110.04260
•
Published
•
2
Beyond Distillation: Task-level Mixture-of-Experts for Efficient
Inference
Paper
•
2110.03742
•
Published
•
3
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with
Architecture-Routed Mixture-of-Experts
Paper
•
2306.04845
•
Published
•
4
Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient
for Convolutional Neural Networks
Paper
•
2306.04073
•
Published
•
2
Unified Scaling Laws for Routed Language Models
Paper
•
2202.01169
•
Published
•
2