facebook
/

nllb-moe-54b

feature-extraction

Model card Files Files and versions Community

ArthurZ HF staff commited on Mar 20, 2023

Commit

a12e84c

•

1 Parent(s): 47998b5

Update README.md

Files changed (1) hide show

README.md +5 -0

README.md CHANGED Viewed

@@ -228,6 +228,11 @@ Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarle
 Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers,
 Safiyyah Saleem, Holger Schwenk, and Jeff Wang.
 ## Generating with NLLB-MoE
 The avalable checkpoints requires around 350GB of storage. Make sure to use `accelerate` if you do not have enough RAM on your machine.

 Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers,
 Safiyyah Saleem, Holger Schwenk, and Jeff Wang.
+## Training:
+- The Expert Output Masking is used for training, which consists in droping the full contribution for some tokens. This corresponds to the following scheme:
+![EOM](https://drive.google.com/uc?id=1VNr3Ug5mQT4uFlvMDaTEyfg9rwbwGFsl/view?usp=sharing)
 ## Generating with NLLB-MoE
 The avalable checkpoints requires around 350GB of storage. Make sure to use `accelerate` if you do not have enough RAM on your machine.