Add multilingual to the language tag

#1
by lbourdois - opened
Files changed (1) hide show
  1. README.md +76 -144
README.md CHANGED
@@ -5,184 +5,116 @@ language:
5
  - hr
6
  - sh
7
  - sr
 
 
 
 
 
8
  language_bcp47:
9
  - bs_Latn
10
  - sr_Cyrl
11
  - sr_Latn
12
-
13
- tags:
14
- - translation
15
- - opus-mt-tc
16
-
17
- license: cc-by-4.0
18
  model-index:
19
  - name: opus-mt-tc-base-en-sh
20
  results:
21
  - task:
22
- name: Translation eng-hrv
23
  type: translation
24
- args: eng-hrv
25
  dataset:
26
  name: flores200-dev
27
  type: flores200-dev
28
  args: eng-hrv
29
  metrics:
30
- - name: BLEU
31
- type: bleu
32
- value: 28.1
33
- - name: chr-F
34
- type: chrf
35
- value: 0.57963
 
 
 
 
 
 
36
  - task:
37
- name: Translation eng-srp_Cyrl
38
  type: translation
39
- args: eng-srp_Cyrl
40
- dataset:
41
- name: flores200-dev
42
- type: flores200-dev
43
- args: eng-srp_Cyrl
44
- metrics:
45
- - name: BLEU
46
- type: bleu
47
- value: 32.2
48
- - name: chr-F
49
- type: chrf
50
- value: 0.60096
51
- - task:
52
  name: Translation eng-hrv
53
- type: translation
54
- args: eng-hrv
55
  dataset:
56
  name: flores200-devtest
57
  type: flores200-devtest
58
  args: eng-hrv
59
  metrics:
60
- - name: BLEU
61
- type: bleu
62
- value: 28.9
63
- - name: chr-F
64
- type: chrf
65
- value: 0.58652
 
 
 
 
 
 
66
  - task:
67
- name: Translation eng-srp_Cyrl
68
  type: translation
69
- args: eng-srp_Cyrl
70
- dataset:
71
- name: flores200-devtest
72
- type: flores200-devtest
73
- args: eng-srp_Cyrl
74
- metrics:
75
- - name: BLEU
76
- type: bleu
77
- value: 31.7
78
- - name: chr-F
79
- type: chrf
80
- value: 0.59874
81
- - task:
82
  name: Translation eng-hrv
83
- type: translation
84
- args: eng-hrv
85
  dataset:
86
  name: flores101-devtest
87
  type: flores_101
88
  args: eng hrv devtest
89
  metrics:
90
- - name: BLEU
91
- type: bleu
92
- value: 28.7
93
- - name: chr-F
94
- type: chrf
95
- value: 0.586
 
 
 
 
 
 
96
  - task:
97
- name: Translation eng-srp_Cyrl
98
  type: translation
99
- args: eng-srp_Cyrl
100
- dataset:
101
- name: flores101-devtest
102
- type: flores_101
103
- args: eng srp_Cyrl devtest
104
- metrics:
105
- - name: BLEU
106
- type: bleu
107
- value: 31.7
108
- - name: chr-F
109
- type: chrf
110
- value: 0.59874
111
- - task:
112
  name: Translation eng-bos_Latn
113
- type: translation
114
- args: eng-bos_Latn
115
  dataset:
116
  name: tatoeba-test-v2021-08-07
117
  type: tatoeba_mt
118
  args: eng-bos_Latn
119
  metrics:
120
- - name: BLEU
121
- type: bleu
122
- value: 46.3
123
- - name: chr-F
124
- type: chrf
125
- value: 0.666
126
- - task:
127
- name: Translation eng-hbs
128
- type: translation
129
- args: eng-hbs
130
- dataset:
131
- name: tatoeba-test-v2021-08-07
132
- type: tatoeba_mt
133
- args: eng-hbs
134
- metrics:
135
- - name: BLEU
136
- type: bleu
137
- value: 42.1
138
- - name: chr-F
139
- type: chrf
140
- value: 0.631
141
- - task:
142
- name: Translation eng-hrv
143
- type: translation
144
- args: eng-hrv
145
- dataset:
146
- name: tatoeba-test-v2021-08-07
147
- type: tatoeba_mt
148
- args: eng-hrv
149
- metrics:
150
- - name: BLEU
151
- type: bleu
152
- value: 49.7
153
- - name: chr-F
154
- type: chrf
155
- value: 0.691
156
- - task:
157
- name: Translation eng-srp_Cyrl
158
- type: translation
159
- args: eng-srp_Cyrl
160
- dataset:
161
- name: tatoeba-test-v2021-08-07
162
- type: tatoeba_mt
163
- args: eng-srp_Cyrl
164
- metrics:
165
- - name: BLEU
166
- type: bleu
167
- value: 45.1
168
- - name: chr-F
169
- type: chrf
170
- value: 0.645
171
- - task:
172
- name: Translation eng-srp_Latn
173
- type: translation
174
- args: eng-srp_Latn
175
- dataset:
176
- name: tatoeba-test-v2021-08-07
177
- type: tatoeba_mt
178
- args: eng-srp_Latn
179
- metrics:
180
- - name: BLEU
181
- type: bleu
182
- value: 39.8
183
- - name: chr-F
184
- type: chrf
185
- value: 0.613
186
  ---
187
  # opus-mt-tc-base-en-sh
188
 
@@ -251,7 +183,7 @@ for t in translated:
251
  print( tokenizer.decode(t, skip_special_tokens=True) )
252
 
253
  # expected output:
254
- # Ti si o tome napraviti vrlo ozbiljnu pogrešku.
255
  # [4]
256
  ```
257
 
@@ -262,7 +194,7 @@ from transformers import pipeline
262
  pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-base-en-sh")
263
  print(pipe(">>hrv<< You're about to make a very serious mistake."))
264
 
265
- # expected output: Ti si o tome napraviti vrlo ozbiljnu pogrešku.
266
  ```
267
 
268
  ## Training
@@ -296,7 +228,7 @@ print(pipe(">>hrv<< You're about to make a very serious mistake."))
296
 
297
  ## Citation Information
298
 
299
- * Publications: [OPUS-MT Building open translation services for the World](https://aclanthology.org/2020.eamt-1.61/) and [The Tatoeba Translation Challenge Realistic Data Sets for Low Resource and Multilingual MT](https://aclanthology.org/2020.wmt-1.139/) (Please, cite if you use this model.)
300
 
301
  ```
302
  @inproceedings{tiedemann-thottingal-2020-opus,
@@ -326,7 +258,7 @@ print(pipe(">>hrv<< You're about to make a very serious mistake."))
326
 
327
  ## Acknowledgements
328
 
329
- The work is supported by the [European Language Grid](https://www.european-language-grid.eu/) as [pilot project 2866](https://live.european-language-grid.eu/catalogue/#/resource/projects/2866), by the [FoTran project](https://www.helsinki.fi/en/researchgroups/natural-language-understanding-with-cross-lingual-grounding), funded by the European Research Council (ERC) under the European Unions Horizon 2020 research and innovation programme (grant agreement No 771113), and the [MeMAD project](https://memad.eu/), funded by the European Unions Horizon 2020 Research and Innovation Programme under grant agreement No 780069. We are also grateful for the generous computational resources and IT infrastructure provided by [CSC -- IT Center for Science](https://www.csc.fi/), Finland.
330
 
331
  ## Model conversion info
332
 
 
5
  - hr
6
  - sh
7
  - sr
8
+ - multilingual
9
+ license: cc-by-4.0
10
+ tags:
11
+ - translation
12
+ - opus-mt-tc
13
  language_bcp47:
14
  - bs_Latn
15
  - sr_Cyrl
16
  - sr_Latn
 
 
 
 
 
 
17
  model-index:
18
  - name: opus-mt-tc-base-en-sh
19
  results:
20
  - task:
 
21
  type: translation
22
+ name: Translation eng-hrv
23
  dataset:
24
  name: flores200-dev
25
  type: flores200-dev
26
  args: eng-hrv
27
  metrics:
28
+ - type: bleu
29
+ value: 28.1
30
+ name: BLEU
31
+ - type: chrf
32
+ value: 0.57963
33
+ name: chr-F
34
+ - type: bleu
35
+ value: 32.2
36
+ name: BLEU
37
+ - type: chrf
38
+ value: 0.60096
39
+ name: chr-F
40
  - task:
 
41
  type: translation
 
 
 
 
 
 
 
 
 
 
 
 
 
42
  name: Translation eng-hrv
 
 
43
  dataset:
44
  name: flores200-devtest
45
  type: flores200-devtest
46
  args: eng-hrv
47
  metrics:
48
+ - type: bleu
49
+ value: 28.9
50
+ name: BLEU
51
+ - type: chrf
52
+ value: 0.58652
53
+ name: chr-F
54
+ - type: bleu
55
+ value: 31.7
56
+ name: BLEU
57
+ - type: chrf
58
+ value: 0.59874
59
+ name: chr-F
60
  - task:
 
61
  type: translation
 
 
 
 
 
 
 
 
 
 
 
 
 
62
  name: Translation eng-hrv
 
 
63
  dataset:
64
  name: flores101-devtest
65
  type: flores_101
66
  args: eng hrv devtest
67
  metrics:
68
+ - type: bleu
69
+ value: 28.7
70
+ name: BLEU
71
+ - type: chrf
72
+ value: 0.586
73
+ name: chr-F
74
+ - type: bleu
75
+ value: 31.7
76
+ name: BLEU
77
+ - type: chrf
78
+ value: 0.59874
79
+ name: chr-F
80
  - task:
 
81
  type: translation
 
 
 
 
 
 
 
 
 
 
 
 
 
82
  name: Translation eng-bos_Latn
 
 
83
  dataset:
84
  name: tatoeba-test-v2021-08-07
85
  type: tatoeba_mt
86
  args: eng-bos_Latn
87
  metrics:
88
+ - type: bleu
89
+ value: 46.3
90
+ name: BLEU
91
+ - type: chrf
92
+ value: 0.666
93
+ name: chr-F
94
+ - type: bleu
95
+ value: 42.1
96
+ name: BLEU
97
+ - type: chrf
98
+ value: 0.631
99
+ name: chr-F
100
+ - type: bleu
101
+ value: 49.7
102
+ name: BLEU
103
+ - type: chrf
104
+ value: 0.691
105
+ name: chr-F
106
+ - type: bleu
107
+ value: 45.1
108
+ name: BLEU
109
+ - type: chrf
110
+ value: 0.645
111
+ name: chr-F
112
+ - type: bleu
113
+ value: 39.8
114
+ name: BLEU
115
+ - type: chrf
116
+ value: 0.613
117
+ name: chr-F
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
118
  ---
119
  # opus-mt-tc-base-en-sh
120
 
 
183
  print( tokenizer.decode(t, skip_special_tokens=True) )
184
 
185
  # expected output:
186
+ # Ti si o tome napraviti vrlo ozbiljnu pogre�ku.
187
  # [4]
188
  ```
189
 
 
194
  pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-base-en-sh")
195
  print(pipe(">>hrv<< You're about to make a very serious mistake."))
196
 
197
+ # expected output: Ti si o tome napraviti vrlo ozbiljnu pogre�ku.
198
  ```
199
 
200
  ## Training
 
228
 
229
  ## Citation Information
230
 
231
+ * Publications: [OPUS-MT Building open translation services for the World](https://aclanthology.org/2020.eamt-1.61/) and [The Tatoeba Translation Challenge Realistic Data Sets for Low Resource and Multilingual MT](https://aclanthology.org/2020.wmt-1.139/) (Please, cite if you use this model.)
232
 
233
  ```
234
  @inproceedings{tiedemann-thottingal-2020-opus,
 
258
 
259
  ## Acknowledgements
260
 
261
+ The work is supported by the [European Language Grid](https://www.european-language-grid.eu/) as [pilot project 2866](https://live.european-language-grid.eu/catalogue/#/resource/projects/2866), by the [FoTran project](https://www.helsinki.fi/en/researchgroups/natural-language-understanding-with-cross-lingual-grounding), funded by the European Research Council (ERC) under the European Unions Horizon 2020 research and innovation programme (grant agreement No 771113), and the [MeMAD project](https://memad.eu/), funded by the European Unions Horizon 2020 Research and Innovation Programme under grant agreement No 780069. We are also grateful for the generous computational resources and IT infrastructure provided by [CSC -- IT Center for Science](https://www.csc.fi/), Finland.
262
 
263
  ## Model conversion info
264