File size: 663 Bytes
6fc683c
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

- Code release: https://github.com/microsoft/torchscale
- Sep 2022: accepted by NeurIPS 2022
- April 2022: release preprint of **X-MoE** - [On the Representation Collapse of Sparse Mixture of Experts](https://arxiv.org/abs/2204.09179)

```
@inproceedings{xmoe,
  title={On the Representation Collapse of Sparse Mixture of Experts},
  author={Zewen Chi and Li Dong and Shaohan Huang and Damai Dai and Shuming Ma and Barun Patra and Saksham Singhal and Payal Bajaj and Xia Song and Xian-Ling Mao and Heyan Huang and Furu Wei},
  booktitle={Advances in Neural Information Processing Systems},
  year={2022},
  url={https://openreview.net/forum?id=mWaYC6CZf5}
}
```