Translation
Transformers
PyTorch
nllb-moe
feature-extraction
ArthurZ HF staff commited on
Commit
a12e84c
1 Parent(s): 47998b5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -0
README.md CHANGED
@@ -228,6 +228,11 @@ Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarle
228
  Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers,
229
  Safiyyah Saleem, Holger Schwenk, and Jeff Wang.
230
 
 
 
 
 
 
231
  ## Generating with NLLB-MoE
232
  The avalable checkpoints requires around 350GB of storage. Make sure to use `accelerate` if you do not have enough RAM on your machine.
233
 
 
228
  Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers,
229
  Safiyyah Saleem, Holger Schwenk, and Jeff Wang.
230
 
231
+ ## Training:
232
+
233
+ - The Expert Output Masking is used for training, which consists in droping the full contribution for some tokens. This corresponds to the following scheme:
234
+ ![EOM](https://drive.google.com/uc?id=1VNr3Ug5mQT4uFlvMDaTEyfg9rwbwGFsl/view?usp=sharing)
235
+
236
  ## Generating with NLLB-MoE
237
  The avalable checkpoints requires around 350GB of storage. Make sure to use `accelerate` if you do not have enough RAM on your machine.
238