Update README.md
Browse files
README.md
CHANGED
@@ -231,7 +231,7 @@ Safiyyah Saleem, Holger Schwenk, and Jeff Wang.
|
|
231 |
## Training:
|
232 |
|
233 |
- The Expert Output Masking is used for training, which consists in droping the full contribution for some tokens. This corresponds to the following scheme:
|
234 |
-

|
235 |
|
236 |
## Generating with NLLB-MoE
|
237 |
The avalable checkpoints requires around 350GB of storage. Make sure to use `accelerate` if you do not have enough RAM on your machine.
|