KnutJaegersberg commited on
Commit
1492ffe
·
1 Parent(s): dbf9075

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +272 -0
README.md ADDED
@@ -0,0 +1,272 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ace
4
+ - acm
5
+ - acq
6
+ - aeb
7
+ - af
8
+ - ajp
9
+ - ak
10
+ - als
11
+ - am
12
+ - apc
13
+ - ar
14
+ - ars
15
+ - ary
16
+ - arz
17
+ - as
18
+ - ast
19
+ - awa
20
+ - ayr
21
+ - azb
22
+ - azj
23
+ - ba
24
+ - bm
25
+ - ban
26
+ - be
27
+ - bem
28
+ - bn
29
+ - bho
30
+ - bjn
31
+ - bo
32
+ - bs
33
+ - bug
34
+ - bg
35
+ - ca
36
+ - ceb
37
+ - cs
38
+ - cjk
39
+ - ckb
40
+ - crh
41
+ - cy
42
+ - da
43
+ - de
44
+ - dik
45
+ - dyu
46
+ - dz
47
+ - el
48
+ - en
49
+ - eo
50
+ - et
51
+ - eu
52
+ - ee
53
+ - fo
54
+ - fj
55
+ - fi
56
+ - fon
57
+ - fr
58
+ - fur
59
+ - fuv
60
+ - gaz
61
+ - gd
62
+ - ga
63
+ - gl
64
+ - gn
65
+ - gu
66
+ - ht
67
+ - ha
68
+ - he
69
+ - hi
70
+ - hne
71
+ - hr
72
+ - hu
73
+ - hy
74
+ - ig
75
+ - ilo
76
+ - id
77
+ - is
78
+ - it
79
+ - jv
80
+ - ja
81
+ - kab
82
+ - kac
83
+ - kam
84
+ - kn
85
+ - ks
86
+ - ka
87
+ - kk
88
+ - kbp
89
+ - kea
90
+ - khk
91
+ - km
92
+ - ki
93
+ - rw
94
+ - ky
95
+ - kmb
96
+ - kmr
97
+ - knc
98
+ - kg
99
+ - ko
100
+ - lo
101
+ - lij
102
+ - li
103
+ - ln
104
+ - lt
105
+ - lmo
106
+ - ltg
107
+ - lb
108
+ - lua
109
+ - lg
110
+ - luo
111
+ - lus
112
+ - lvs
113
+ - mag
114
+ - mai
115
+ - ml
116
+ - mar
117
+ - min
118
+ - mk
119
+ - mt
120
+ - mni
121
+ - mos
122
+ - mi
123
+ - my
124
+ - nl
125
+ - nn
126
+ - nb
127
+ - npi
128
+ - nso
129
+ - nus
130
+ - ny
131
+ - oc
132
+ - ory
133
+ - pag
134
+ - pa
135
+ - pap
136
+ - pbt
137
+ - pes
138
+ - plt
139
+ - pl
140
+ - pt
141
+ - prs
142
+ - quy
143
+ - ro
144
+ - rn
145
+ - ru
146
+ - sg
147
+ - sa
148
+ - sat
149
+ - scn
150
+ - shn
151
+ - si
152
+ - sk
153
+ - sl
154
+ - sm
155
+ - sn
156
+ - sd
157
+ - so
158
+ - st
159
+ - es
160
+ - sc
161
+ - sr
162
+ - ss
163
+ - su
164
+ - sv
165
+ - swh
166
+ - szl
167
+ - ta
168
+ - taq
169
+ - tt
170
+ - te
171
+ - tg
172
+ - tl
173
+ - th
174
+ - ti
175
+ - tpi
176
+ - tn
177
+ - ts
178
+ - tk
179
+ - tum
180
+ - tr
181
+ - tw
182
+ - tzm
183
+ - ug
184
+ - uk
185
+ - umb
186
+ - ur
187
+ - uzn
188
+ - vec
189
+ - vi
190
+ - war
191
+ - wo
192
+ - xh
193
+ - ydd
194
+ - yo
195
+ - yue
196
+ - zh
197
+ - zsm
198
+ - zu
199
+
200
+ language_details: "ace_Arab, ace_Latn, acm_Arab, acq_Arab, aeb_Arab, afr_Latn, ajp_Arab, aka_Latn, amh_Ethi, apc_Arab, arb_Arab, ars_Arab, ary_Arab, arz_Arab, asm_Beng, ast_Latn, awa_Deva, ayr_Latn, azb_Arab, azj_Latn, bak_Cyrl, bam_Latn, ban_Latn,bel_Cyrl, bem_Latn, ben_Beng, bho_Deva, bjn_Arab, bjn_Latn, bod_Tibt, bos_Latn, bug_Latn, bul_Cyrl, cat_Latn, ceb_Latn, ces_Latn, cjk_Latn, ckb_Arab, crh_Latn, cym_Latn, dan_Latn, deu_Latn, dik_Latn, dyu_Latn, dzo_Tibt, ell_Grek, eng_Latn, epo_Latn, est_Latn, eus_Latn, ewe_Latn, fao_Latn, pes_Arab, fij_Latn, fin_Latn, fon_Latn, fra_Latn, fur_Latn, fuv_Latn, gla_Latn, gle_Latn, glg_Latn, grn_Latn, guj_Gujr, hat_Latn, hau_Latn, heb_Hebr, hin_Deva, hne_Deva, hrv_Latn, hun_Latn, hye_Armn, ibo_Latn, ilo_Latn, ind_Latn, isl_Latn, ita_Latn, jav_Latn, jpn_Jpan, kab_Latn, kac_Latn, kam_Latn, kan_Knda, kas_Arab, kas_Deva, kat_Geor, knc_Arab, knc_Latn, kaz_Cyrl, kbp_Latn, kea_Latn, khm_Khmr, kik_Latn, kin_Latn, kir_Cyrl, kmb_Latn, kon_Latn, kor_Hang, kmr_Latn, lao_Laoo, lvs_Latn, lij_Latn, lim_Latn, lin_Latn, lit_Latn, lmo_Latn, ltg_Latn, ltz_Latn, lua_Latn, lug_Latn, luo_Latn, lus_Latn, mag_Deva, mai_Deva, mal_Mlym, mar_Deva, min_Latn, mkd_Cyrl, plt_Latn, mlt_Latn, mni_Beng, khk_Cyrl, mos_Latn, mri_Latn, zsm_Latn, mya_Mymr, nld_Latn, nno_Latn, nob_Latn, npi_Deva, nso_Latn, nus_Latn, nya_Latn, oci_Latn, gaz_Latn, ory_Orya, pag_Latn, pan_Guru, pap_Latn, pol_Latn, por_Latn, prs_Arab, pbt_Arab, quy_Latn, ron_Latn, run_Latn, rus_Cyrl, sag_Latn, san_Deva, sat_Beng, scn_Latn, shn_Mymr, sin_Sinh, slk_Latn, slv_Latn, smo_Latn, sna_Latn, snd_Arab, som_Latn, sot_Latn, spa_Latn, als_Latn, srd_Latn, srp_Cyrl, ssw_Latn, sun_Latn, swe_Latn, swh_Latn, szl_Latn, tam_Taml, tat_Cyrl, tel_Telu, tgk_Cyrl, tgl_Latn, tha_Thai, tir_Ethi, taq_Latn, taq_Tfng, tpi_Latn, tsn_Latn, tso_Latn, tuk_Latn, tum_Latn, tur_Latn, twi_Latn, tzm_Tfng, uig_Arab, ukr_Cyrl, umb_Latn, urd_Arab, uzn_Latn, vec_Latn, vie_Latn, war_Latn, wol_Latn, xho_Latn, ydd_Hebr, yor_Latn, yue_Hant, zho_Hans, zho_Hant, zul_Latn"
201
+
202
+ tags:
203
+ - translation
204
+ license: "cc-by-nc-4.0"
205
+ datasets:
206
+ - flores-200
207
+ metrics:
208
+ - bleu
209
+ - spbleu
210
+ - chrf++
211
+ inference: false
212
+ ---
213
+
214
+ # NLLB-MoE
215
+
216
+ This is the model card of NLLB-MoE variant.
217
+
218
+ - Information about training algorithms, parameters, fairness constraints or other applied approaches, and features. The exact training algorithm, data and the strategies to handle data imbalances for high and low resource languages that were used to train NLLB-200 is described in the paper.
219
+ - Paper or other resource for more information NLLB Team et al, No Language Left Behind: Scaling Human-Centered Machine Translation, Arxiv, 2022
220
+ - License: CC-BY-NC
221
+ - Where to send questions or comments about the model: https://github.com/facebookresearch/fairseq/issues
222
+
223
+ The NLLB model was presented in [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by Marta R. Costa-jussà, James Cross, Onur Çelebi,
224
+ Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula,
225
+ Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews,
226
+ Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers,
227
+ Safiyyah Saleem, Holger Schwenk, and Jeff Wang.
228
+
229
+ ## Training:
230
+
231
+ - The Expert Output Masking is used for training, which consists in droping the full contribution for some tokens. This corresponds to the following scheme:
232
+ ![EOM](https://drive.google.com/uc?export=view&id=1VNr3Ug5mQT4uFlvMDaTEyfg9rwbwGFsl)
233
+
234
+ ## Generating with NLLB-MoE
235
+ The avalable checkpoints requires around 350GB of storage. Make sure to use `accelerate` if you do not have enough RAM on your machine.
236
+
237
+ While generating the target text set the `forced_bos_token_id` to the target language id. The following
238
+ example shows how to translate English to French using the *facebook/nllb-moe-54b* model.
239
+
240
+ Note that we're using the BCP-47 code for French `fra_Latn`. See [here](https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200)
241
+ for the list of all BCP-47 in the Flores 200 dataset.
242
+
243
+ ```python
244
+ >>> from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
245
+
246
+ >>> tokenizer = AutoTokenizer.from_pretrained("facebook/nllb-moe-54b")
247
+ >>> model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-moe-54b")
248
+
249
+ >>> batched_input = [
250
+ 'We now have 4-month-old mice that are non-diabetic that used to be diabetic," he added.',
251
+ "Dr. Ehud Ur, professor of medicine at Dalhousie University in Halifax, Nova Scotia and chair of the clinical and scientific division of the Canadian Diabetes Association cautioned that the research is still in its early days."
252
+ "Like some other experts, he is skeptical about whether diabetes can be cured, noting that these findings have no relevance to people who already have Type 1 diabetes."
253
+ "On Monday, Sara Danius, permanent secretary of the Nobel Committee for Literature at the Swedish Academy, publicly announced during a radio program on Sveriges Radio in Sweden the committee, unable to reach Bob Dylan directly about winning the 2016 Nobel Prize in Literature, had abandoned its efforts to reach him.",
254
+ 'Danius said, "Right now we are doing nothing. I have called and sent emails to his closest collaborator and received very friendly replies. For now, that is certainly enough."',
255
+ "Previously, Ring's CEO, Jamie Siminoff, remarked the company started when his doorbell wasn't audible from his shop in his garage.",
256
+ ]
257
+ >>> inputs = tokenizer(batched_input, return_tensors="pt", padding = True)
258
+
259
+ >>> translated_tokens = model.generate(
260
+ ... **inputs, forced_bos_token_id=tokenizer.lang_code_to_id["fra_Latn"]
261
+ ... )
262
+ >>> tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)
263
+ ['"Nous avons maintenant des souris de 4 mois non diabétiques qui étaient diabétiques", a-t-il ajouté.',
264
+ "Le docteur Ehud Ur, professeur de médecine à l'université Dalhousie, à Halifax, en Nouvelle-Écosse, et président de la division clinique et scientifique de l'Association canadienne du diabète, prévient que la recherche n'en est qu'à ses débuts.",
265
+ "Comme d'autres spécialistes, il est sceptique quant à la guérison du diabète, notant que ces résultats ne sont pas pertinents pour les personnes atteintes de diabète de type 1.",
266
+ "Lundi, Sara Danius, secrétaire permanente du Comité Nobel de littérature à l'Académie suédoise, a annoncé publiquement lors d'une émission de radio sur Sveriges Radio en Suède que le comité, incapable de contacter Bob Dylan directement au sujet du prix Nobel de littérature 2016, avait abandonné ses efforts pour le joindre.",
267
+ "Danius a déclaré: \"Pour le moment, nous ne faisons rien. J'ai appelé et envoyé des courriels à son plus proche collaborateur et j'ai reçu des réponses très amicales. Pour l'instant, c'est certainement suffisant\".",
268
+ "Auparavant, le PDG de Ring, Jamie Siminoff, a fait remarquer que la société avait commencé lorsque sa sonnette n'était pas audible depuis son magasin dans son garage.",
269
+ "Il a construit une sonnette WiFi, il a dit.",
270
+ ]
271
+ ```
272
+