lbourdois commited on
Commit
7d9dc5e
1 Parent(s): 297daca

Add multilingual to the language tag

Browse files

Hi! A PR to add multilingual to the language tag to improve the referencing.

Files changed (1) hide show
  1. README.md +146 -330
README.md CHANGED
@@ -5,360 +5,176 @@ language:
5
  - nb
6
  - nn
7
  - sv
8
-
 
9
  tags:
10
  - translation
11
  - opus-mt-tc
12
-
13
- license: cc-by-4.0
14
  model-index:
15
  - name: opus-mt-tc-big-gmq-gmq
16
  results:
17
  - task:
18
- name: Translation isl-swe
19
  type: translation
20
- args: isl-swe
21
  dataset:
22
  name: europeana2021
23
  type: europeana2021
24
  args: isl-swe
25
  metrics:
26
- - name: BLEU
27
- type: bleu
28
- value: 22.2
29
- - name: chr-F
30
- type: chrf
31
- value: 0.45562
 
 
 
 
 
 
 
 
 
 
 
 
32
  - task:
33
- name: Translation nob-isl
34
  type: translation
35
- args: nob-isl
36
- dataset:
37
- name: europeana2021
38
- type: europeana2021
39
- args: nob-isl
40
- metrics:
41
- - name: BLEU
42
- type: bleu
43
- value: 29.7
44
- - name: chr-F
45
- type: chrf
46
- value: 0.54171
47
- - task:
48
- name: Translation nob-swe
49
- type: translation
50
- args: nob-swe
51
- dataset:
52
- name: europeana2021
53
- type: europeana2021
54
- args: nob-swe
55
- metrics:
56
- - name: BLEU
57
- type: bleu
58
- value: 54.0
59
- - name: chr-F
60
- type: chrf
61
- value: 0.73891
62
- - task:
63
  name: Translation dan-isl
64
- type: translation
65
- args: dan-isl
66
  dataset:
67
  name: flores101-devtest
68
  type: flores_101
69
  args: dan isl devtest
70
  metrics:
71
- - name: BLEU
72
- type: bleu
73
- value: 22.2
74
- - name: chr-F
75
- type: chrf
76
- value: 0.50227
77
- - task:
78
- name: Translation dan-nob
79
- type: translation
80
- args: dan-nob
81
- dataset:
82
- name: flores101-devtest
83
- type: flores_101
84
- args: dan nob devtest
85
- metrics:
86
- - name: BLEU
87
- type: bleu
88
- value: 28.6
89
- - name: chr-F
90
- type: chrf
91
- value: 0.58445
92
- - task:
93
- name: Translation dan-swe
94
- type: translation
95
- args: dan-swe
96
- dataset:
97
- name: flores101-devtest
98
- type: flores_101
99
- args: dan swe devtest
100
- metrics:
101
- - name: BLEU
102
- type: bleu
103
- value: 38.5
104
- - name: chr-F
105
- type: chrf
106
- value: 0.65000
107
- - task:
108
- name: Translation isl-dan
109
- type: translation
110
- args: isl-dan
111
- dataset:
112
- name: flores101-devtest
113
- type: flores_101
114
- args: isl dan devtest
115
- metrics:
116
- - name: BLEU
117
- type: bleu
118
- value: 27.2
119
- - name: chr-F
120
- type: chrf
121
- value: 0.53630
122
- - task:
123
- name: Translation isl-nob
124
- type: translation
125
- args: isl-nob
126
- dataset:
127
- name: flores101-devtest
128
- type: flores_101
129
- args: isl nob devtest
130
- metrics:
131
- - name: BLEU
132
- type: bleu
133
- value: 20.5
134
- - name: chr-F
135
- type: chrf
136
- value: 0.49434
137
- - task:
138
- name: Translation isl-swe
139
- type: translation
140
- args: isl-swe
141
- dataset:
142
- name: flores101-devtest
143
- type: flores_101
144
- args: isl swe devtest
145
- metrics:
146
- - name: BLEU
147
- type: bleu
148
- value: 26.0
149
- - name: chr-F
150
- type: chrf
151
- value: 0.53373
152
- - task:
153
- name: Translation nob-dan
154
- type: translation
155
- args: nob-dan
156
- dataset:
157
- name: flores101-devtest
158
- type: flores_101
159
- args: nob dan devtest
160
- metrics:
161
- - name: BLEU
162
- type: bleu
163
- value: 31.7
164
- - name: chr-F
165
- type: chrf
166
- value: 0.59657
167
- - task:
168
- name: Translation nob-isl
169
- type: translation
170
- args: nob-isl
171
- dataset:
172
- name: flores101-devtest
173
- type: flores_101
174
- args: nob isl devtest
175
- metrics:
176
- - name: BLEU
177
- type: bleu
178
- value: 18.9
179
- - name: chr-F
180
- type: chrf
181
- value: 0.47432
182
- - task:
183
- name: Translation nob-swe
184
- type: translation
185
- args: nob-swe
186
- dataset:
187
- name: flores101-devtest
188
- type: flores_101
189
- args: nob swe devtest
190
- metrics:
191
- - name: BLEU
192
- type: bleu
193
- value: 31.3
194
- - name: chr-F
195
- type: chrf
196
- value: 0.60030
197
- - task:
198
- name: Translation swe-dan
199
- type: translation
200
- args: swe-dan
201
- dataset:
202
- name: flores101-devtest
203
- type: flores_101
204
- args: swe dan devtest
205
- metrics:
206
- - name: BLEU
207
- type: bleu
208
- value: 39.0
209
- - name: chr-F
210
- type: chrf
211
- value: 0.64340
212
- - task:
213
- name: Translation swe-isl
214
- type: translation
215
- args: swe-isl
216
- dataset:
217
- name: flores101-devtest
218
- type: flores_101
219
- args: swe isl devtest
220
- metrics:
221
- - name: BLEU
222
- type: bleu
223
- value: 21.7
224
- - name: chr-F
225
- type: chrf
226
- value: 0.49590
227
  - task:
228
- name: Translation swe-nob
229
  type: translation
230
- args: swe-nob
231
- dataset:
232
- name: flores101-devtest
233
- type: flores_101
234
- args: swe nob devtest
235
- metrics:
236
- - name: BLEU
237
- type: bleu
238
- value: 28.9
239
- - name: chr-F
240
- type: chrf
241
- value: 0.58336
242
- - task:
243
  name: Translation dan-nob
244
- type: translation
245
- args: dan-nob
246
  dataset:
247
  name: tatoeba-test-v2021-08-07
248
  type: tatoeba_mt
249
  args: dan-nob
250
  metrics:
251
- - name: BLEU
252
- type: bleu
253
- value: 78.2
254
- - name: chr-F
255
- type: chrf
256
- value: 0.87556
257
- - task:
258
- name: Translation dan-swe
259
- type: translation
260
- args: dan-swe
261
- dataset:
262
- name: tatoeba-test-v2021-08-07
263
- type: tatoeba_mt
264
- args: dan-swe
265
- metrics:
266
- - name: BLEU
267
- type: bleu
268
- value: 72.5
269
- - name: chr-F
270
- type: chrf
271
- value: 0.83556
272
- - task:
273
- name: Translation nno-nob
274
- type: translation
275
- args: nno-nob
276
- dataset:
277
- name: tatoeba-test-v2021-08-07
278
- type: tatoeba_mt
279
- args: nno-nob
280
- metrics:
281
- - name: BLEU
282
- type: bleu
283
- value: 78.9
284
- - name: chr-F
285
- type: chrf
286
- value: 0.88349
287
- - task:
288
- name: Translation nob-dan
289
- type: translation
290
- args: nob-dan
291
- dataset:
292
- name: tatoeba-test-v2021-08-07
293
- type: tatoeba_mt
294
- args: nob-dan
295
- metrics:
296
- - name: BLEU
297
- type: bleu
298
- value: 73.9
299
- - name: chr-F
300
- type: chrf
301
- value: 0.85345
302
- - task:
303
- name: Translation nob-nno
304
- type: translation
305
- args: nob-nno
306
- dataset:
307
- name: tatoeba-test-v2021-08-07
308
- type: tatoeba_mt
309
- args: nob-nno
310
- metrics:
311
- - name: BLEU
312
- type: bleu
313
- value: 55.2
314
- - name: chr-F
315
- type: chrf
316
- value: 0.74571
317
- - task:
318
- name: Translation nob-swe
319
- type: translation
320
- args: nob-swe
321
- dataset:
322
- name: tatoeba-test-v2021-08-07
323
- type: tatoeba_mt
324
- args: nob-swe
325
- metrics:
326
- - name: BLEU
327
- type: bleu
328
- value: 73.9
329
- - name: chr-F
330
- type: chrf
331
- value: 0.84747
332
- - task:
333
- name: Translation swe-dan
334
- type: translation
335
- args: swe-dan
336
- dataset:
337
- name: tatoeba-test-v2021-08-07
338
- type: tatoeba_mt
339
- args: swe-dan
340
- metrics:
341
- - name: BLEU
342
- type: bleu
343
- value: 72.6
344
- - name: chr-F
345
- type: chrf
346
- value: 0.83392
347
- - task:
348
- name: Translation swe-nob
349
- type: translation
350
- args: swe-nob
351
- dataset:
352
- name: tatoeba-test-v2021-08-07
353
- type: tatoeba_mt
354
- args: swe-nob
355
- metrics:
356
- - name: BLEU
357
- type: bleu
358
- value: 76.3
359
- - name: chr-F
360
- type: chrf
361
- value: 0.85815
362
  ---
363
  # opus-mt-tc-big-gmq-gmq
364
 
@@ -415,7 +231,7 @@ from transformers import MarianMTModel, MarianTokenizer
415
 
416
  src_text = [
417
  ">>fao<< Jeg er bange for kakerlakker.",
418
- ">>nob<< Vladivostok är en stad i Ryssland."
419
  ]
420
 
421
  model_name = "pytorch-models/opus-mt-tc-big-gmq-gmq"
@@ -427,7 +243,7 @@ for t in translated:
427
  print( tokenizer.decode(t, skip_special_tokens=True) )
428
 
429
  # expected output:
430
- # Tað eru uml.
431
  # Vladivostok er en by i Russland.
432
  ```
433
 
@@ -438,7 +254,7 @@ from transformers import pipeline
438
  pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-big-gmq-gmq")
439
  print(pipe(">>fao<< Jeg er bange for kakerlakker."))
440
 
441
- # expected output: Tað eru uml.
442
  ```
443
 
444
  ## Training
@@ -484,7 +300,7 @@ print(pipe(">>fao<< Jeg er bange for kakerlakker."))
484
 
485
  ## Citation Information
486
 
487
- * Publications: [OPUS-MT Building open translation services for the World](https://aclanthology.org/2020.eamt-1.61/) and [The Tatoeba Translation Challenge Realistic Data Sets for Low Resource and Multilingual MT](https://aclanthology.org/2020.wmt-1.139/) (Please, cite if you use this model.)
488
 
489
  ```
490
  @inproceedings{tiedemann-thottingal-2020-opus,
@@ -514,7 +330,7 @@ print(pipe(">>fao<< Jeg er bange for kakerlakker."))
514
 
515
  ## Acknowledgements
516
 
517
- The work is supported by the [European Language Grid](https://www.european-language-grid.eu/) as [pilot project 2866](https://live.european-language-grid.eu/catalogue/#/resource/projects/2866), by the [FoTran project](https://www.helsinki.fi/en/researchgroups/natural-language-understanding-with-cross-lingual-grounding), funded by the European Research Council (ERC) under the European Unions Horizon 2020 research and innovation programme (grant agreement No 771113), and the [MeMAD project](https://memad.eu/), funded by the European Unions Horizon 2020 Research and Innovation Programme under grant agreement No 780069. We are also grateful for the generous computational resources and IT infrastructure provided by [CSC -- IT Center for Science](https://www.csc.fi/), Finland.
518
 
519
  ## Model conversion info
520
 
 
5
  - nb
6
  - nn
7
  - sv
8
+ - multilingual
9
+ license: cc-by-4.0
10
  tags:
11
  - translation
12
  - opus-mt-tc
 
 
13
  model-index:
14
  - name: opus-mt-tc-big-gmq-gmq
15
  results:
16
  - task:
 
17
  type: translation
18
+ name: Translation isl-swe
19
  dataset:
20
  name: europeana2021
21
  type: europeana2021
22
  args: isl-swe
23
  metrics:
24
+ - type: bleu
25
+ value: 22.2
26
+ name: BLEU
27
+ - type: chrf
28
+ value: 0.45562
29
+ name: chr-F
30
+ - type: bleu
31
+ value: 29.7
32
+ name: BLEU
33
+ - type: chrf
34
+ value: 0.54171
35
+ name: chr-F
36
+ - type: bleu
37
+ value: 54.0
38
+ name: BLEU
39
+ - type: chrf
40
+ value: 0.73891
41
+ name: chr-F
42
  - task:
 
43
  type: translation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
  name: Translation dan-isl
 
 
45
  dataset:
46
  name: flores101-devtest
47
  type: flores_101
48
  args: dan isl devtest
49
  metrics:
50
+ - type: bleu
51
+ value: 22.2
52
+ name: BLEU
53
+ - type: chrf
54
+ value: 0.50227
55
+ name: chr-F
56
+ - type: bleu
57
+ value: 28.6
58
+ name: BLEU
59
+ - type: chrf
60
+ value: 0.58445
61
+ name: chr-F
62
+ - type: bleu
63
+ value: 38.5
64
+ name: BLEU
65
+ - type: chrf
66
+ value: 0.65
67
+ name: chr-F
68
+ - type: bleu
69
+ value: 27.2
70
+ name: BLEU
71
+ - type: chrf
72
+ value: 0.5363
73
+ name: chr-F
74
+ - type: bleu
75
+ value: 20.5
76
+ name: BLEU
77
+ - type: chrf
78
+ value: 0.49434
79
+ name: chr-F
80
+ - type: bleu
81
+ value: 26.0
82
+ name: BLEU
83
+ - type: chrf
84
+ value: 0.53373
85
+ name: chr-F
86
+ - type: bleu
87
+ value: 31.7
88
+ name: BLEU
89
+ - type: chrf
90
+ value: 0.59657
91
+ name: chr-F
92
+ - type: bleu
93
+ value: 18.9
94
+ name: BLEU
95
+ - type: chrf
96
+ value: 0.47432
97
+ name: chr-F
98
+ - type: bleu
99
+ value: 31.3
100
+ name: BLEU
101
+ - type: chrf
102
+ value: 0.6003
103
+ name: chr-F
104
+ - type: bleu
105
+ value: 39.0
106
+ name: BLEU
107
+ - type: chrf
108
+ value: 0.6434
109
+ name: chr-F
110
+ - type: bleu
111
+ value: 21.7
112
+ name: BLEU
113
+ - type: chrf
114
+ value: 0.4959
115
+ name: chr-F
116
+ - type: bleu
117
+ value: 28.9
118
+ name: BLEU
119
+ - type: chrf
120
+ value: 0.58336
121
+ name: chr-F
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
122
  - task:
 
123
  type: translation
 
 
 
 
 
 
 
 
 
 
 
 
 
124
  name: Translation dan-nob
 
 
125
  dataset:
126
  name: tatoeba-test-v2021-08-07
127
  type: tatoeba_mt
128
  args: dan-nob
129
  metrics:
130
+ - type: bleu
131
+ value: 78.2
132
+ name: BLEU
133
+ - type: chrf
134
+ value: 0.87556
135
+ name: chr-F
136
+ - type: bleu
137
+ value: 72.5
138
+ name: BLEU
139
+ - type: chrf
140
+ value: 0.83556
141
+ name: chr-F
142
+ - type: bleu
143
+ value: 78.9
144
+ name: BLEU
145
+ - type: chrf
146
+ value: 0.88349
147
+ name: chr-F
148
+ - type: bleu
149
+ value: 73.9
150
+ name: BLEU
151
+ - type: chrf
152
+ value: 0.85345
153
+ name: chr-F
154
+ - type: bleu
155
+ value: 55.2
156
+ name: BLEU
157
+ - type: chrf
158
+ value: 0.74571
159
+ name: chr-F
160
+ - type: bleu
161
+ value: 73.9
162
+ name: BLEU
163
+ - type: chrf
164
+ value: 0.84747
165
+ name: chr-F
166
+ - type: bleu
167
+ value: 72.6
168
+ name: BLEU
169
+ - type: chrf
170
+ value: 0.83392
171
+ name: chr-F
172
+ - type: bleu
173
+ value: 76.3
174
+ name: BLEU
175
+ - type: chrf
176
+ value: 0.85815
177
+ name: chr-F
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
178
  ---
179
  # opus-mt-tc-big-gmq-gmq
180
 
 
231
 
232
  src_text = [
233
  ">>fao<< Jeg er bange for kakerlakker.",
234
+ ">>nob<< Vladivostok �r en stad i Ryssland."
235
  ]
236
 
237
  model_name = "pytorch-models/opus-mt-tc-big-gmq-gmq"
 
243
  print( tokenizer.decode(t, skip_special_tokens=True) )
244
 
245
  # expected output:
246
+ # Ta� eru uml.
247
  # Vladivostok er en by i Russland.
248
  ```
249
 
 
254
  pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-big-gmq-gmq")
255
  print(pipe(">>fao<< Jeg er bange for kakerlakker."))
256
 
257
+ # expected output: Ta� eru uml.
258
  ```
259
 
260
  ## Training
 
300
 
301
  ## Citation Information
302
 
303
+ * Publications: [OPUS-MT Building open translation services for the World](https://aclanthology.org/2020.eamt-1.61/) and [The Tatoeba Translation Challenge Realistic Data Sets for Low Resource and Multilingual MT](https://aclanthology.org/2020.wmt-1.139/) (Please, cite if you use this model.)
304
 
305
  ```
306
  @inproceedings{tiedemann-thottingal-2020-opus,
 
330
 
331
  ## Acknowledgements
332
 
333
+ The work is supported by the [European Language Grid](https://www.european-language-grid.eu/) as [pilot project 2866](https://live.european-language-grid.eu/catalogue/#/resource/projects/2866), by the [FoTran project](https://www.helsinki.fi/en/researchgroups/natural-language-understanding-with-cross-lingual-grounding), funded by the European Research Council (ERC) under the European Unions Horizon 2020 research and innovation programme (grant agreement No 771113), and the [MeMAD project](https://memad.eu/), funded by the European Unions Horizon 2020 Research and Innovation Programme under grant agreement No 780069. We are also grateful for the generous computational resources and IT infrastructure provided by [CSC -- IT Center for Science](https://www.csc.fi/), Finland.
334
 
335
  ## Model conversion info
336