Add multilingual to the language tag

#2
by lbourdois - opened
Files changed (1) hide show
  1. README.md +40 -75
README.md CHANGED
@@ -4,157 +4,122 @@ language:
4
  - en
5
  - es
6
  - oc
 
 
7
  tags:
8
  - translation
9
  - opus-mt-tc
10
- license: cc-by-4.0
11
  model-index:
12
  - name: opus-mt-tc-big-cat_oci_spa-en
13
  results:
14
  - task:
15
- name: Translation cat-eng
16
  type: translation
17
- args: cat-eng
18
  dataset:
19
  name: flores101-devtest
20
  type: flores_101
21
  args: cat eng devtest
22
  metrics:
23
- - name: BLEU
24
- type: bleu
25
  value: 45.4
26
- - task:
27
- name: Translation oci-eng
28
- type: translation
29
- args: oci-eng
30
- dataset:
31
- name: flores101-devtest
32
- type: flores_101
33
- args: oci eng devtest
34
- metrics:
35
- - name: BLEU
36
- type: bleu
37
  value: 37.5
38
- - task:
39
- name: Translation spa-eng
40
- type: translation
41
- args: spa-eng
42
- dataset:
43
- name: flores101-devtest
44
- type: flores_101
45
- args: spa eng devtest
46
- metrics:
47
- - name: BLEU
48
- type: bleu
49
  value: 29.9
 
50
  - task:
51
- name: Translation spa-eng
52
  type: translation
53
- args: spa-eng
54
  dataset:
55
  name: news-test2008
56
  type: news-test2008
57
  args: spa-eng
58
  metrics:
59
- - name: BLEU
60
- type: bleu
61
  value: 27.9
 
62
  - task:
63
- name: Translation cat-eng
64
  type: translation
65
- args: cat-eng
66
  dataset:
67
  name: tatoeba-test-v2021-08-07
68
  type: tatoeba_mt
69
  args: cat-eng
70
  metrics:
71
- - name: BLEU
72
- type: bleu
73
  value: 57.3
74
- - task:
75
- name: Translation spa-eng
76
- type: translation
77
- args: spa-eng
78
- dataset:
79
- name: tatoeba-test-v2021-08-07
80
- type: tatoeba_mt
81
- args: spa-eng
82
- metrics:
83
- - name: BLEU
84
- type: bleu
85
  value: 62.3
 
86
  - task:
87
- name: Translation spa-eng
88
  type: translation
89
- args: spa-eng
90
  dataset:
91
  name: tico19-test
92
  type: tico19-test
93
  args: spa-eng
94
  metrics:
95
- - name: BLEU
96
- type: bleu
97
  value: 51.8
 
98
  - task:
99
- name: Translation spa-eng
100
  type: translation
101
- args: spa-eng
102
  dataset:
103
  name: newstest2009
104
  type: wmt-2009-news
105
  args: spa-eng
106
  metrics:
107
- - name: BLEU
108
- type: bleu
109
  value: 30.2
 
110
  - task:
111
- name: Translation spa-eng
112
  type: translation
113
- args: spa-eng
114
  dataset:
115
  name: newstest2010
116
  type: wmt-2010-news
117
  args: spa-eng
118
  metrics:
119
- - name: BLEU
120
- type: bleu
121
  value: 36.8
 
122
  - task:
123
- name: Translation spa-eng
124
  type: translation
125
- args: spa-eng
126
  dataset:
127
  name: newstest2011
128
  type: wmt-2011-news
129
  args: spa-eng
130
  metrics:
131
- - name: BLEU
132
- type: bleu
133
  value: 34.7
 
134
  - task:
135
- name: Translation spa-eng
136
  type: translation
137
- args: spa-eng
138
  dataset:
139
  name: newstest2012
140
  type: wmt-2012-news
141
  args: spa-eng
142
  metrics:
143
- - name: BLEU
144
- type: bleu
145
  value: 38.6
 
146
  - task:
147
- name: Translation spa-eng
148
  type: translation
149
- args: spa-eng
150
  dataset:
151
  name: newstest2013
152
  type: wmt-2013-news
153
  args: spa-eng
154
  metrics:
155
- - name: BLEU
156
- type: bleu
157
  value: 35.3
 
158
  ---
159
  # opus-mt-tc-big-cat_oci_spa-en
160
 
@@ -162,7 +127,7 @@ Neural machine translation model for translating from Catalan, Occitan and Spani
162
 
163
  This model is part of the [OPUS-MT project](https://github.com/Helsinki-NLP/Opus-MT), an effort to make neural machine translation models widely available and accessible for many languages in the world. All models are originally trained using the amazing framework of [Marian NMT](https://marian-nmt.github.io/), an efficient NMT implementation written in pure C++. The models have been converted to pyTorch using the transformers library by huggingface. Training data is taken from [OPUS](https://opus.nlpl.eu/) and training pipelines use the procedures of [OPUS-MT-train](https://github.com/Helsinki-NLP/Opus-MT-train).
164
 
165
- * Publications: [OPUS-MT Building open translation services for the World](https://aclanthology.org/2020.eamt-1.61/) and [The Tatoeba Translation Challenge Realistic Data Sets for Low Resource and Multilingual MT](https://aclanthology.org/2020.wmt-1.139/) (Please, cite if you use this model.)
166
 
167
  ```
168
  @inproceedings{tiedemann-thottingal-2020-opus,
@@ -209,8 +174,8 @@ A short example code:
209
  from transformers import MarianMTModel, MarianTokenizer
210
 
211
  src_text = [
212
- "¿Puedo hacerte una pregunta?",
213
- "Toca algo de música."
214
  ]
215
 
216
  model_name = "pytorch-models/opus-mt-tc-big-cat_oci_spa-en"
@@ -231,7 +196,7 @@ You can also use OPUS-MT models with the transformers pipelines, for example:
231
  ```python
232
  from transformers import pipeline
233
  pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-big-cat_oci_spa-en")
234
- print(pipe("¿Puedo hacerte una pregunta?"))
235
 
236
  # expected output: Can I ask you a question?
237
  ```
@@ -261,7 +226,7 @@ print(pipe("¿Puedo hacerte una pregunta?"))
261
 
262
  ## Acknowledgements
263
 
264
- The work is supported by the [European Language Grid](https://www.european-language-grid.eu/) as [pilot project 2866](https://live.european-language-grid.eu/catalogue/#/resource/projects/2866), by the [FoTran project](https://www.helsinki.fi/en/researchgroups/natural-language-understanding-with-cross-lingual-grounding), funded by the European Research Council (ERC) under the European Unions Horizon 2020 research and innovation programme (grant agreement No 771113), and the [MeMAD project](https://memad.eu/), funded by the European Unions Horizon 2020 Research and Innovation Programme under grant agreement No 780069. We are also grateful for the generous computational resources and IT infrastructure provided by [CSC -- IT Center for Science](https://www.csc.fi/), Finland.
265
 
266
  ## Model conversion info
267
 
 
4
  - en
5
  - es
6
  - oc
7
+ - multilingual
8
+ license: cc-by-4.0
9
  tags:
10
  - translation
11
  - opus-mt-tc
 
12
  model-index:
13
  - name: opus-mt-tc-big-cat_oci_spa-en
14
  results:
15
  - task:
 
16
  type: translation
17
+ name: Translation cat-eng
18
  dataset:
19
  name: flores101-devtest
20
  type: flores_101
21
  args: cat eng devtest
22
  metrics:
23
+ - type: bleu
 
24
  value: 45.4
25
+ name: BLEU
26
+ - type: bleu
 
 
 
 
 
 
 
 
 
27
  value: 37.5
28
+ name: BLEU
29
+ - type: bleu
 
 
 
 
 
 
 
 
 
30
  value: 29.9
31
+ name: BLEU
32
  - task:
 
33
  type: translation
34
+ name: Translation spa-eng
35
  dataset:
36
  name: news-test2008
37
  type: news-test2008
38
  args: spa-eng
39
  metrics:
40
+ - type: bleu
 
41
  value: 27.9
42
+ name: BLEU
43
  - task:
 
44
  type: translation
45
+ name: Translation cat-eng
46
  dataset:
47
  name: tatoeba-test-v2021-08-07
48
  type: tatoeba_mt
49
  args: cat-eng
50
  metrics:
51
+ - type: bleu
 
52
  value: 57.3
53
+ name: BLEU
54
+ - type: bleu
 
 
 
 
 
 
 
 
 
55
  value: 62.3
56
+ name: BLEU
57
  - task:
 
58
  type: translation
59
+ name: Translation spa-eng
60
  dataset:
61
  name: tico19-test
62
  type: tico19-test
63
  args: spa-eng
64
  metrics:
65
+ - type: bleu
 
66
  value: 51.8
67
+ name: BLEU
68
  - task:
 
69
  type: translation
70
+ name: Translation spa-eng
71
  dataset:
72
  name: newstest2009
73
  type: wmt-2009-news
74
  args: spa-eng
75
  metrics:
76
+ - type: bleu
 
77
  value: 30.2
78
+ name: BLEU
79
  - task:
 
80
  type: translation
81
+ name: Translation spa-eng
82
  dataset:
83
  name: newstest2010
84
  type: wmt-2010-news
85
  args: spa-eng
86
  metrics:
87
+ - type: bleu
 
88
  value: 36.8
89
+ name: BLEU
90
  - task:
 
91
  type: translation
92
+ name: Translation spa-eng
93
  dataset:
94
  name: newstest2011
95
  type: wmt-2011-news
96
  args: spa-eng
97
  metrics:
98
+ - type: bleu
 
99
  value: 34.7
100
+ name: BLEU
101
  - task:
 
102
  type: translation
103
+ name: Translation spa-eng
104
  dataset:
105
  name: newstest2012
106
  type: wmt-2012-news
107
  args: spa-eng
108
  metrics:
109
+ - type: bleu
 
110
  value: 38.6
111
+ name: BLEU
112
  - task:
 
113
  type: translation
114
+ name: Translation spa-eng
115
  dataset:
116
  name: newstest2013
117
  type: wmt-2013-news
118
  args: spa-eng
119
  metrics:
120
+ - type: bleu
 
121
  value: 35.3
122
+ name: BLEU
123
  ---
124
  # opus-mt-tc-big-cat_oci_spa-en
125
 
 
127
 
128
  This model is part of the [OPUS-MT project](https://github.com/Helsinki-NLP/Opus-MT), an effort to make neural machine translation models widely available and accessible for many languages in the world. All models are originally trained using the amazing framework of [Marian NMT](https://marian-nmt.github.io/), an efficient NMT implementation written in pure C++. The models have been converted to pyTorch using the transformers library by huggingface. Training data is taken from [OPUS](https://opus.nlpl.eu/) and training pipelines use the procedures of [OPUS-MT-train](https://github.com/Helsinki-NLP/Opus-MT-train).
129
 
130
+ * Publications: [OPUS-MT Building open translation services for the World](https://aclanthology.org/2020.eamt-1.61/) and [The Tatoeba Translation Challenge Realistic Data Sets for Low Resource and Multilingual MT](https://aclanthology.org/2020.wmt-1.139/) (Please, cite if you use this model.)
131
 
132
  ```
133
  @inproceedings{tiedemann-thottingal-2020-opus,
 
174
  from transformers import MarianMTModel, MarianTokenizer
175
 
176
  src_text = [
177
+ "Puedo hacerte una pregunta?",
178
+ "Toca algo de m�sica."
179
  ]
180
 
181
  model_name = "pytorch-models/opus-mt-tc-big-cat_oci_spa-en"
 
196
  ```python
197
  from transformers import pipeline
198
  pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-big-cat_oci_spa-en")
199
+ print(pipe("Puedo hacerte una pregunta?"))
200
 
201
  # expected output: Can I ask you a question?
202
  ```
 
226
 
227
  ## Acknowledgements
228
 
229
+ The work is supported by the [European Language Grid](https://www.european-language-grid.eu/) as [pilot project 2866](https://live.european-language-grid.eu/catalogue/#/resource/projects/2866), by the [FoTran project](https://www.helsinki.fi/en/researchgroups/natural-language-understanding-with-cross-lingual-grounding), funded by the European Research Council (ERC) under the European Unions Horizon 2020 research and innovation programme (grant agreement No 771113), and the [MeMAD project](https://memad.eu/), funded by the European Unions Horizon 2020 Research and Innovation Programme under grant agreement No 780069. We are also grateful for the generous computational resources and IT infrastructure provided by [CSC -- IT Center for Science](https://www.csc.fi/), Finland.
230
 
231
  ## Model conversion info
232