Suparious commited on
Commit
be49994
1 Parent(s): b3e2b88

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -5
README.md CHANGED
@@ -136,11 +136,6 @@ A model to test how MoE will route without square expansion.
136
 
137
  ### "[What is a Mixture of Experts (MoE)?](https://huggingface.co/blog/moe)"
138
 
139
- <a href='https://ko-fi.com/S6S2UH2TC' target='_blank'><img height='38' style='border:0px;height:36px;' src='https://storage.ko-fi.com/cdn/kofi1.png?v=3' border='0' alt='Buy Me a Coffee at ko-fi.com' /></a>
140
- <a href='https://discord.gg/KFS229xD' target='_blank'><img width='140' height='500' style='border:0px;height:36px;' src='https://i.ibb.co/tqwznYM/Discord-button.png' border='0' alt='Join Our Discord!' /></a>
141
-
142
- ### (from the MistralAI papers...click the quoted question above to navigate to it directly.)
143
-
144
  The scale of a model is one of the most important axes for better model quality. Given a fixed computing budget, training a larger model for fewer steps is better than training a smaller model for more steps.
145
 
146
  Mixture of Experts enable models to be pretrained with far less compute, which means you can dramatically scale up the model or dataset size with the same compute budget as a dense model. In particular, a MoE model should achieve the same quality as its dense counterpart much faster during pretraining.
 
136
 
137
  ### "[What is a Mixture of Experts (MoE)?](https://huggingface.co/blog/moe)"
138
 
 
 
 
 
 
139
  The scale of a model is one of the most important axes for better model quality. Given a fixed computing budget, training a larger model for fewer steps is better than training a smaller model for more steps.
140
 
141
  Mixture of Experts enable models to be pretrained with far less compute, which means you can dramatically scale up the model or dataset size with the same compute budget as a dense model. In particular, a MoE model should achieve the same quality as its dense counterpart much faster during pretraining.