Add multilingual to the language tag

#1
by lbourdois - opened
Files changed (1) hide show
  1. README.md +57 -114
README.md CHANGED
@@ -5,135 +5,78 @@ language:
5
  - is
6
  - nb
7
  - sv
8
-
 
9
  tags:
10
  - translation
11
  - opus-mt-tc
12
-
13
- license: cc-by-4.0
14
  model-index:
15
  - name: opus-mt-tc-big-de-gmq
16
  results:
17
  - task:
18
- name: Translation deu-dan
19
  type: translation
20
- args: deu-dan
21
  dataset:
22
  name: flores101-devtest
23
  type: flores_101
24
  args: deu dan devtest
25
  metrics:
26
- - name: BLEU
27
- type: bleu
28
- value: 35.6
29
- - name: chr-F
30
- type: chrf
31
- value: 0.62363
32
- - task:
33
- name: Translation deu-isl
34
- type: translation
35
- args: deu-isl
36
- dataset:
37
- name: flores101-devtest
38
- type: flores_101
39
- args: deu isl devtest
40
- metrics:
41
- - name: BLEU
42
- type: bleu
43
- value: 20.6
44
- - name: chr-F
45
- type: chrf
46
- value: 0.48691
 
 
 
47
  - task:
48
- name: Translation deu-nob
49
  type: translation
50
- args: deu-nob
51
- dataset:
52
- name: flores101-devtest
53
- type: flores_101
54
- args: deu nob devtest
55
- metrics:
56
- - name: BLEU
57
- type: bleu
58
- value: 25.3
59
- - name: chr-F
60
- type: chrf
61
- value: 0.55765
62
- - task:
63
- name: Translation deu-swe
64
- type: translation
65
- args: deu-swe
66
- dataset:
67
- name: flores101-devtest
68
- type: flores_101
69
- args: deu swe devtest
70
- metrics:
71
- - name: BLEU
72
- type: bleu
73
- value: 34.7
74
- - name: chr-F
75
- type: chrf
76
- value: 0.62323
77
- - task:
78
  name: Translation deu-dan
79
- type: translation
80
- args: deu-dan
81
  dataset:
82
  name: tatoeba-test-v2021-08-07
83
  type: tatoeba_mt
84
  args: deu-dan
85
  metrics:
86
- - name: BLEU
87
- type: bleu
88
- value: 58.7
89
- - name: chr-F
90
- type: chrf
91
- value: 0.74306
92
- - task:
93
- name: Translation deu-isl
94
- type: translation
95
- args: deu-isl
96
- dataset:
97
- name: tatoeba-test-v2021-08-07
98
- type: tatoeba_mt
99
- args: deu-isl
100
- metrics:
101
- - name: BLEU
102
- type: bleu
103
- value: 47.1
104
- - name: chr-F
105
- type: chrf
106
- value: 0.65180
107
- - task:
108
- name: Translation deu-nob
109
- type: translation
110
- args: deu-nob
111
- dataset:
112
- name: tatoeba-test-v2021-08-07
113
- type: tatoeba_mt
114
- args: deu-nob
115
- metrics:
116
- - name: BLEU
117
- type: bleu
118
- value: 52.5
119
- - name: chr-F
120
- type: chrf
121
- value: 0.71062
122
- - task:
123
- name: Translation deu-swe
124
- type: translation
125
- args: deu-swe
126
- dataset:
127
- name: tatoeba-test-v2021-08-07
128
- type: tatoeba_mt
129
- args: deu-swe
130
- metrics:
131
- - name: BLEU
132
- type: bleu
133
- value: 58.3
134
- - name: chr-F
135
- type: chrf
136
- value: 0.72658
137
  ---
138
  # opus-mt-tc-big-de-gmq
139
 
@@ -189,7 +132,7 @@ A short example code:
189
  from transformers import MarianMTModel, MarianTokenizer
190
 
191
  src_text = [
192
- ">>dan<< Ich hätte fast meinen Pass vergessen.",
193
  ">>dan<< Dieses Fenster hier ist schusssicher."
194
  ]
195
 
@@ -202,7 +145,7 @@ for t in translated:
202
  print( tokenizer.decode(t, skip_special_tokens=True) )
203
 
204
  # expected output:
205
- # Jeg havde næsten glemt mit pas.
206
  # Dette vindue er skudsikkert.
207
  ```
208
 
@@ -211,9 +154,9 @@ You can also use OPUS-MT models with the transformers pipelines, for example:
211
  ```python
212
  from transformers import pipeline
213
  pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-big-de-gmq")
214
- print(pipe(">>dan<< Ich hätte fast meinen Pass vergessen."))
215
 
216
- # expected output: Jeg havde næsten glemt mit pas.
217
  ```
218
 
219
  ## Training
@@ -244,7 +187,7 @@ print(pipe(">>dan<< Ich hätte fast meinen Pass vergessen."))
244
 
245
  ## Citation Information
246
 
247
- * Publications: [OPUS-MT Building open translation services for the World](https://aclanthology.org/2020.eamt-1.61/) and [The Tatoeba Translation Challenge Realistic Data Sets for Low Resource and Multilingual MT](https://aclanthology.org/2020.wmt-1.139/) (Please, cite if you use this model.)
248
 
249
  ```
250
  @inproceedings{tiedemann-thottingal-2020-opus,
@@ -274,7 +217,7 @@ print(pipe(">>dan<< Ich hätte fast meinen Pass vergessen."))
274
 
275
  ## Acknowledgements
276
 
277
- The work is supported by the [European Language Grid](https://www.european-language-grid.eu/) as [pilot project 2866](https://live.european-language-grid.eu/catalogue/#/resource/projects/2866), by the [FoTran project](https://www.helsinki.fi/en/researchgroups/natural-language-understanding-with-cross-lingual-grounding), funded by the European Research Council (ERC) under the European Unions Horizon 2020 research and innovation programme (grant agreement No 771113), and the [MeMAD project](https://memad.eu/), funded by the European Unions Horizon 2020 Research and Innovation Programme under grant agreement No 780069. We are also grateful for the generous computational resources and IT infrastructure provided by [CSC -- IT Center for Science](https://www.csc.fi/), Finland.
278
 
279
  ## Model conversion info
280
 
 
5
  - is
6
  - nb
7
  - sv
8
+ - multilingual
9
+ license: cc-by-4.0
10
  tags:
11
  - translation
12
  - opus-mt-tc
 
 
13
  model-index:
14
  - name: opus-mt-tc-big-de-gmq
15
  results:
16
  - task:
 
17
  type: translation
18
+ name: Translation deu-dan
19
  dataset:
20
  name: flores101-devtest
21
  type: flores_101
22
  args: deu dan devtest
23
  metrics:
24
+ - type: bleu
25
+ value: 35.6
26
+ name: BLEU
27
+ - type: chrf
28
+ value: 0.62363
29
+ name: chr-F
30
+ - type: bleu
31
+ value: 20.6
32
+ name: BLEU
33
+ - type: chrf
34
+ value: 0.48691
35
+ name: chr-F
36
+ - type: bleu
37
+ value: 25.3
38
+ name: BLEU
39
+ - type: chrf
40
+ value: 0.55765
41
+ name: chr-F
42
+ - type: bleu
43
+ value: 34.7
44
+ name: BLEU
45
+ - type: chrf
46
+ value: 0.62323
47
+ name: chr-F
48
  - task:
 
49
  type: translation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
  name: Translation deu-dan
 
 
51
  dataset:
52
  name: tatoeba-test-v2021-08-07
53
  type: tatoeba_mt
54
  args: deu-dan
55
  metrics:
56
+ - type: bleu
57
+ value: 58.7
58
+ name: BLEU
59
+ - type: chrf
60
+ value: 0.74306
61
+ name: chr-F
62
+ - type: bleu
63
+ value: 47.1
64
+ name: BLEU
65
+ - type: chrf
66
+ value: 0.6518
67
+ name: chr-F
68
+ - type: bleu
69
+ value: 52.5
70
+ name: BLEU
71
+ - type: chrf
72
+ value: 0.71062
73
+ name: chr-F
74
+ - type: bleu
75
+ value: 58.3
76
+ name: BLEU
77
+ - type: chrf
78
+ value: 0.72658
79
+ name: chr-F
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
  ---
81
  # opus-mt-tc-big-de-gmq
82
 
 
132
  from transformers import MarianMTModel, MarianTokenizer
133
 
134
  src_text = [
135
+ ">>dan<< Ich h�tte fast meinen Pass vergessen.",
136
  ">>dan<< Dieses Fenster hier ist schusssicher."
137
  ]
138
 
 
145
  print( tokenizer.decode(t, skip_special_tokens=True) )
146
 
147
  # expected output:
148
+ # Jeg havde n�sten glemt mit pas.
149
  # Dette vindue er skudsikkert.
150
  ```
151
 
 
154
  ```python
155
  from transformers import pipeline
156
  pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-big-de-gmq")
157
+ print(pipe(">>dan<< Ich h�tte fast meinen Pass vergessen."))
158
 
159
+ # expected output: Jeg havde n�sten glemt mit pas.
160
  ```
161
 
162
  ## Training
 
187
 
188
  ## Citation Information
189
 
190
+ * Publications: [OPUS-MT Building open translation services for the World](https://aclanthology.org/2020.eamt-1.61/) and [The Tatoeba Translation Challenge Realistic Data Sets for Low Resource and Multilingual MT](https://aclanthology.org/2020.wmt-1.139/) (Please, cite if you use this model.)
191
 
192
  ```
193
  @inproceedings{tiedemann-thottingal-2020-opus,
 
217
 
218
  ## Acknowledgements
219
 
220
+ The work is supported by the [European Language Grid](https://www.european-language-grid.eu/) as [pilot project 2866](https://live.european-language-grid.eu/catalogue/#/resource/projects/2866), by the [FoTran project](https://www.helsinki.fi/en/researchgroups/natural-language-understanding-with-cross-lingual-grounding), funded by the European Research Council (ERC) under the European Unions Horizon 2020 research and innovation programme (grant agreement No 771113), and the [MeMAD project](https://memad.eu/), funded by the European Unions Horizon 2020 Research and Innovation Programme under grant agreement No 780069. We are also grateful for the generous computational resources and IT infrastructure provided by [CSC -- IT Center for Science](https://www.csc.fi/), Finland.
221
 
222
  ## Model conversion info
223