facebook
/

nllb-moe-54b

feature-extraction

Model card Files Files and versions Community

ArthurZ HF staff commited on Mar 20, 2023

Commit

39e5b0f

·

1 Parent(s): a12e84c

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -231,7 +231,7 @@ Safiyyah Saleem, Holger Schwenk, and Jeff Wang.
 ## Training:
 - The Expert Output Masking is used for training, which consists in droping the full contribution for some tokens. This corresponds to the following scheme:
-![EOM](https://drive.google.com/uc?id=1VNr3Ug5mQT4uFlvMDaTEyfg9rwbwGFsl/view?usp=sharing)
 ## Generating with NLLB-MoE
 The avalable checkpoints requires around 350GB of storage. Make sure to use `accelerate` if you do not have enough RAM on your machine.

 ## Training:
 - The Expert Output Masking is used for training, which consists in droping the full contribution for some tokens. This corresponds to the following scheme:
+![EOM](https://drive.google.com/uc?export=view&id=1VNr3Ug5mQT4uFlvMDaTEyfg9rwbwGFsl)
 ## Generating with NLLB-MoE
 The avalable checkpoints requires around 350GB of storage. Make sure to use `accelerate` if you do not have enough RAM on your machine.