system HF staff commited on
Commit
9794fa3
1 Parent(s): 9293536

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +254 -0
README.md ADDED
@@ -0,0 +1,254 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: ine
3
+ tags:
4
+ - translation
5
+
6
+ license: apache-2.0
7
+ ---
8
+
9
+ ### ine-eng
10
+
11
+ * source group: Indo-European languages
12
+ * target group: English
13
+ * OPUS readme: [ine-eng](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/ine-eng/README.md)
14
+
15
+ * model: transformer
16
+ * source language(s): afr aln ang_Latn arg asm ast awa bel bel_Latn ben bho bos_Latn bre bul bul_Latn cat ces cor cos csb_Latn cym dan deu dsb egl ell enm_Latn ext fao fra frm_Latn frr fry gcf_Latn gla gle glg glv gom gos got_Goth grc_Grek gsw guj hat hif_Latn hin hrv hsb hye ind isl ita jdt_Cyrl ksh kur_Arab kur_Latn lad lad_Latn lat_Latn lav lij lit lld_Latn lmo ltg ltz mai mar max_Latn mfe min mkd mwl nds nld nno nob nob_Hebr non_Latn npi oci ori orv_Cyrl oss pan_Guru pap pdc pes pes_Latn pes_Thaa pms pnb pol por prg_Latn pus roh rom ron rue rus san_Deva scn sco sgs sin slv snd_Arab spa sqi srp_Cyrl srp_Latn stq swe swg tgk_Cyrl tly_Latn tmw_Latn ukr urd vec wln yid zlm_Latn zsm_Latn zza
17
+ * target language(s): eng
18
+ * model: transformer
19
+ * pre-processing: normalization + SentencePiece (spm32k,spm32k)
20
+ * download original weights: [opus2m-2020-08-01.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/ine-eng/opus2m-2020-08-01.zip)
21
+ * test set translations: [opus2m-2020-08-01.test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/ine-eng/opus2m-2020-08-01.test.txt)
22
+ * test set scores: [opus2m-2020-08-01.eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/ine-eng/opus2m-2020-08-01.eval.txt)
23
+
24
+ ## Benchmarks
25
+
26
+ | testset | BLEU | chr-F |
27
+ |-----------------------|-------|-------|
28
+ | newsdev2014-hineng.hin.eng | 11.2 | 0.375 |
29
+ | newsdev2016-enro-roneng.ron.eng | 35.5 | 0.614 |
30
+ | newsdev2017-enlv-laveng.lav.eng | 25.1 | 0.542 |
31
+ | newsdev2019-engu-gujeng.guj.eng | 16.0 | 0.420 |
32
+ | newsdev2019-enlt-liteng.lit.eng | 24.0 | 0.522 |
33
+ | newsdiscussdev2015-enfr-fraeng.fra.eng | 30.1 | 0.550 |
34
+ | newsdiscusstest2015-enfr-fraeng.fra.eng | 33.4 | 0.572 |
35
+ | newssyscomb2009-ceseng.ces.eng | 24.0 | 0.520 |
36
+ | newssyscomb2009-deueng.deu.eng | 25.7 | 0.526 |
37
+ | newssyscomb2009-fraeng.fra.eng | 27.9 | 0.550 |
38
+ | newssyscomb2009-itaeng.ita.eng | 31.4 | 0.574 |
39
+ | newssyscomb2009-spaeng.spa.eng | 28.3 | 0.555 |
40
+ | news-test2008-deueng.deu.eng | 24.0 | 0.515 |
41
+ | news-test2008-fraeng.fra.eng | 24.5 | 0.524 |
42
+ | news-test2008-spaeng.spa.eng | 25.5 | 0.533 |
43
+ | newstest2009-ceseng.ces.eng | 23.3 | 0.516 |
44
+ | newstest2009-deueng.deu.eng | 23.2 | 0.512 |
45
+ | newstest2009-fraeng.fra.eng | 27.3 | 0.545 |
46
+ | newstest2009-itaeng.ita.eng | 30.3 | 0.567 |
47
+ | newstest2009-spaeng.spa.eng | 27.9 | 0.549 |
48
+ | newstest2010-ceseng.ces.eng | 23.8 | 0.523 |
49
+ | newstest2010-deueng.deu.eng | 26.2 | 0.545 |
50
+ | newstest2010-fraeng.fra.eng | 28.6 | 0.562 |
51
+ | newstest2010-spaeng.spa.eng | 31.4 | 0.581 |
52
+ | newstest2011-ceseng.ces.eng | 24.2 | 0.521 |
53
+ | newstest2011-deueng.deu.eng | 23.9 | 0.522 |
54
+ | newstest2011-fraeng.fra.eng | 29.5 | 0.570 |
55
+ | newstest2011-spaeng.spa.eng | 30.3 | 0.570 |
56
+ | newstest2012-ceseng.ces.eng | 23.5 | 0.516 |
57
+ | newstest2012-deueng.deu.eng | 24.9 | 0.529 |
58
+ | newstest2012-fraeng.fra.eng | 30.0 | 0.568 |
59
+ | newstest2012-ruseng.rus.eng | 29.9 | 0.565 |
60
+ | newstest2012-spaeng.spa.eng | 33.3 | 0.593 |
61
+ | newstest2013-ceseng.ces.eng | 25.6 | 0.531 |
62
+ | newstest2013-deueng.deu.eng | 27.7 | 0.545 |
63
+ | newstest2013-fraeng.fra.eng | 30.0 | 0.561 |
64
+ | newstest2013-ruseng.rus.eng | 24.4 | 0.514 |
65
+ | newstest2013-spaeng.spa.eng | 30.8 | 0.577 |
66
+ | newstest2014-csen-ceseng.ces.eng | 27.7 | 0.558 |
67
+ | newstest2014-deen-deueng.deu.eng | 27.7 | 0.545 |
68
+ | newstest2014-fren-fraeng.fra.eng | 32.2 | 0.592 |
69
+ | newstest2014-hien-hineng.hin.eng | 16.7 | 0.450 |
70
+ | newstest2014-ruen-ruseng.rus.eng | 27.2 | 0.552 |
71
+ | newstest2015-encs-ceseng.ces.eng | 25.4 | 0.518 |
72
+ | newstest2015-ende-deueng.deu.eng | 28.8 | 0.552 |
73
+ | newstest2015-enru-ruseng.rus.eng | 25.6 | 0.527 |
74
+ | newstest2016-encs-ceseng.ces.eng | 27.0 | 0.540 |
75
+ | newstest2016-ende-deueng.deu.eng | 33.5 | 0.592 |
76
+ | newstest2016-enro-roneng.ron.eng | 32.8 | 0.591 |
77
+ | newstest2016-enru-ruseng.rus.eng | 24.8 | 0.523 |
78
+ | newstest2017-encs-ceseng.ces.eng | 23.7 | 0.510 |
79
+ | newstest2017-ende-deueng.deu.eng | 29.3 | 0.556 |
80
+ | newstest2017-enlv-laveng.lav.eng | 18.9 | 0.486 |
81
+ | newstest2017-enru-ruseng.rus.eng | 28.0 | 0.546 |
82
+ | newstest2018-encs-ceseng.ces.eng | 24.9 | 0.521 |
83
+ | newstest2018-ende-deueng.deu.eng | 36.0 | 0.604 |
84
+ | newstest2018-enru-ruseng.rus.eng | 23.8 | 0.517 |
85
+ | newstest2019-deen-deueng.deu.eng | 31.5 | 0.570 |
86
+ | newstest2019-guen-gujeng.guj.eng | 12.1 | 0.377 |
87
+ | newstest2019-lten-liteng.lit.eng | 26.6 | 0.555 |
88
+ | newstest2019-ruen-ruseng.rus.eng | 27.5 | 0.541 |
89
+ | Tatoeba-test.afr-eng.afr.eng | 59.0 | 0.724 |
90
+ | Tatoeba-test.ang-eng.ang.eng | 9.9 | 0.254 |
91
+ | Tatoeba-test.arg-eng.arg.eng | 41.6 | 0.487 |
92
+ | Tatoeba-test.asm-eng.asm.eng | 22.8 | 0.392 |
93
+ | Tatoeba-test.ast-eng.ast.eng | 36.1 | 0.521 |
94
+ | Tatoeba-test.awa-eng.awa.eng | 11.6 | 0.280 |
95
+ | Tatoeba-test.bel-eng.bel.eng | 42.2 | 0.597 |
96
+ | Tatoeba-test.ben-eng.ben.eng | 45.8 | 0.598 |
97
+ | Tatoeba-test.bho-eng.bho.eng | 34.4 | 0.518 |
98
+ | Tatoeba-test.bre-eng.bre.eng | 24.4 | 0.405 |
99
+ | Tatoeba-test.bul-eng.bul.eng | 50.8 | 0.660 |
100
+ | Tatoeba-test.cat-eng.cat.eng | 51.2 | 0.677 |
101
+ | Tatoeba-test.ces-eng.ces.eng | 47.6 | 0.641 |
102
+ | Tatoeba-test.cor-eng.cor.eng | 5.4 | 0.214 |
103
+ | Tatoeba-test.cos-eng.cos.eng | 61.0 | 0.675 |
104
+ | Tatoeba-test.csb-eng.csb.eng | 22.5 | 0.394 |
105
+ | Tatoeba-test.cym-eng.cym.eng | 34.7 | 0.522 |
106
+ | Tatoeba-test.dan-eng.dan.eng | 56.2 | 0.708 |
107
+ | Tatoeba-test.deu-eng.deu.eng | 44.9 | 0.625 |
108
+ | Tatoeba-test.dsb-eng.dsb.eng | 21.0 | 0.383 |
109
+ | Tatoeba-test.egl-eng.egl.eng | 6.9 | 0.221 |
110
+ | Tatoeba-test.ell-eng.ell.eng | 62.1 | 0.741 |
111
+ | Tatoeba-test.enm-eng.enm.eng | 22.6 | 0.466 |
112
+ | Tatoeba-test.ext-eng.ext.eng | 33.2 | 0.496 |
113
+ | Tatoeba-test.fao-eng.fao.eng | 28.1 | 0.460 |
114
+ | Tatoeba-test.fas-eng.fas.eng | 9.6 | 0.306 |
115
+ | Tatoeba-test.fra-eng.fra.eng | 50.3 | 0.661 |
116
+ | Tatoeba-test.frm-eng.frm.eng | 30.0 | 0.457 |
117
+ | Tatoeba-test.frr-eng.frr.eng | 15.2 | 0.301 |
118
+ | Tatoeba-test.fry-eng.fry.eng | 34.4 | 0.525 |
119
+ | Tatoeba-test.gcf-eng.gcf.eng | 18.4 | 0.317 |
120
+ | Tatoeba-test.gla-eng.gla.eng | 24.1 | 0.400 |
121
+ | Tatoeba-test.gle-eng.gle.eng | 52.2 | 0.671 |
122
+ | Tatoeba-test.glg-eng.glg.eng | 50.5 | 0.669 |
123
+ | Tatoeba-test.glv-eng.glv.eng | 5.7 | 0.189 |
124
+ | Tatoeba-test.gos-eng.gos.eng | 19.2 | 0.378 |
125
+ | Tatoeba-test.got-eng.got.eng | 0.1 | 0.022 |
126
+ | Tatoeba-test.grc-eng.grc.eng | 0.9 | 0.095 |
127
+ | Tatoeba-test.gsw-eng.gsw.eng | 23.9 | 0.390 |
128
+ | Tatoeba-test.guj-eng.guj.eng | 28.0 | 0.428 |
129
+ | Tatoeba-test.hat-eng.hat.eng | 44.2 | 0.567 |
130
+ | Tatoeba-test.hbs-eng.hbs.eng | 51.6 | 0.666 |
131
+ | Tatoeba-test.hif-eng.hif.eng | 22.3 | 0.451 |
132
+ | Tatoeba-test.hin-eng.hin.eng | 41.7 | 0.585 |
133
+ | Tatoeba-test.hsb-eng.hsb.eng | 46.4 | 0.590 |
134
+ | Tatoeba-test.hye-eng.hye.eng | 40.4 | 0.564 |
135
+ | Tatoeba-test.isl-eng.isl.eng | 43.8 | 0.605 |
136
+ | Tatoeba-test.ita-eng.ita.eng | 60.7 | 0.735 |
137
+ | Tatoeba-test.jdt-eng.jdt.eng | 5.5 | 0.091 |
138
+ | Tatoeba-test.kok-eng.kok.eng | 7.8 | 0.205 |
139
+ | Tatoeba-test.ksh-eng.ksh.eng | 15.8 | 0.284 |
140
+ | Tatoeba-test.kur-eng.kur.eng | 11.6 | 0.232 |
141
+ | Tatoeba-test.lad-eng.lad.eng | 30.7 | 0.484 |
142
+ | Tatoeba-test.lah-eng.lah.eng | 11.0 | 0.286 |
143
+ | Tatoeba-test.lat-eng.lat.eng | 24.4 | 0.432 |
144
+ | Tatoeba-test.lav-eng.lav.eng | 47.2 | 0.646 |
145
+ | Tatoeba-test.lij-eng.lij.eng | 9.0 | 0.287 |
146
+ | Tatoeba-test.lit-eng.lit.eng | 51.7 | 0.670 |
147
+ | Tatoeba-test.lld-eng.lld.eng | 22.4 | 0.369 |
148
+ | Tatoeba-test.lmo-eng.lmo.eng | 26.1 | 0.381 |
149
+ | Tatoeba-test.ltz-eng.ltz.eng | 39.8 | 0.536 |
150
+ | Tatoeba-test.mai-eng.mai.eng | 72.3 | 0.758 |
151
+ | Tatoeba-test.mar-eng.mar.eng | 32.0 | 0.554 |
152
+ | Tatoeba-test.mfe-eng.mfe.eng | 63.1 | 0.822 |
153
+ | Tatoeba-test.mkd-eng.mkd.eng | 49.5 | 0.638 |
154
+ | Tatoeba-test.msa-eng.msa.eng | 38.6 | 0.566 |
155
+ | Tatoeba-test.multi.eng | 45.6 | 0.615 |
156
+ | Tatoeba-test.mwl-eng.mwl.eng | 40.4 | 0.767 |
157
+ | Tatoeba-test.nds-eng.nds.eng | 35.5 | 0.538 |
158
+ | Tatoeba-test.nep-eng.nep.eng | 4.9 | 0.209 |
159
+ | Tatoeba-test.nld-eng.nld.eng | 54.2 | 0.694 |
160
+ | Tatoeba-test.non-eng.non.eng | 39.3 | 0.573 |
161
+ | Tatoeba-test.nor-eng.nor.eng | 50.9 | 0.663 |
162
+ | Tatoeba-test.oci-eng.oci.eng | 19.6 | 0.386 |
163
+ | Tatoeba-test.ori-eng.ori.eng | 16.2 | 0.364 |
164
+ | Tatoeba-test.orv-eng.orv.eng | 13.6 | 0.288 |
165
+ | Tatoeba-test.oss-eng.oss.eng | 9.4 | 0.301 |
166
+ | Tatoeba-test.pan-eng.pan.eng | 17.1 | 0.389 |
167
+ | Tatoeba-test.pap-eng.pap.eng | 57.0 | 0.680 |
168
+ | Tatoeba-test.pdc-eng.pdc.eng | 41.6 | 0.526 |
169
+ | Tatoeba-test.pms-eng.pms.eng | 13.7 | 0.333 |
170
+ | Tatoeba-test.pol-eng.pol.eng | 46.5 | 0.632 |
171
+ | Tatoeba-test.por-eng.por.eng | 56.4 | 0.710 |
172
+ | Tatoeba-test.prg-eng.prg.eng | 2.3 | 0.193 |
173
+ | Tatoeba-test.pus-eng.pus.eng | 3.2 | 0.194 |
174
+ | Tatoeba-test.roh-eng.roh.eng | 17.5 | 0.420 |
175
+ | Tatoeba-test.rom-eng.rom.eng | 5.0 | 0.237 |
176
+ | Tatoeba-test.ron-eng.ron.eng | 51.4 | 0.670 |
177
+ | Tatoeba-test.rue-eng.rue.eng | 26.0 | 0.447 |
178
+ | Tatoeba-test.rus-eng.rus.eng | 47.8 | 0.634 |
179
+ | Tatoeba-test.san-eng.san.eng | 4.0 | 0.195 |
180
+ | Tatoeba-test.scn-eng.scn.eng | 45.1 | 0.440 |
181
+ | Tatoeba-test.sco-eng.sco.eng | 41.9 | 0.582 |
182
+ | Tatoeba-test.sgs-eng.sgs.eng | 38.7 | 0.498 |
183
+ | Tatoeba-test.sin-eng.sin.eng | 29.7 | 0.499 |
184
+ | Tatoeba-test.slv-eng.slv.eng | 38.2 | 0.564 |
185
+ | Tatoeba-test.snd-eng.snd.eng | 12.7 | 0.342 |
186
+ | Tatoeba-test.spa-eng.spa.eng | 53.2 | 0.687 |
187
+ | Tatoeba-test.sqi-eng.sqi.eng | 51.9 | 0.679 |
188
+ | Tatoeba-test.stq-eng.stq.eng | 9.0 | 0.391 |
189
+ | Tatoeba-test.swe-eng.swe.eng | 57.4 | 0.705 |
190
+ | Tatoeba-test.swg-eng.swg.eng | 18.0 | 0.338 |
191
+ | Tatoeba-test.tgk-eng.tgk.eng | 24.3 | 0.413 |
192
+ | Tatoeba-test.tly-eng.tly.eng | 1.1 | 0.094 |
193
+ | Tatoeba-test.ukr-eng.ukr.eng | 48.0 | 0.639 |
194
+ | Tatoeba-test.urd-eng.urd.eng | 27.2 | 0.471 |
195
+ | Tatoeba-test.vec-eng.vec.eng | 28.0 | 0.398 |
196
+ | Tatoeba-test.wln-eng.wln.eng | 17.5 | 0.320 |
197
+ | Tatoeba-test.yid-eng.yid.eng | 26.9 | 0.457 |
198
+ | Tatoeba-test.zza-eng.zza.eng | 1.7 | 0.131 |
199
+
200
+
201
+ ### System Info:
202
+ - hf_name: ine-eng
203
+
204
+ - source_languages: ine
205
+
206
+ - target_languages: eng
207
+
208
+ - opus_readme_url: https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/ine-eng/README.md
209
+
210
+ - original_repo: Tatoeba-Challenge
211
+
212
+ - tags: ['translation']
213
+
214
+ - prepro: normalization + SentencePiece (spm32k,spm32k)
215
+
216
+ - url_model: https://object.pouta.csc.fi/Tatoeba-MT-models/ine-eng/opus2m-2020-08-01.zip
217
+
218
+ - url_test_set: https://object.pouta.csc.fi/Tatoeba-MT-models/ine-eng/opus2m-2020-08-01.test.txt
219
+
220
+ - src_alpha3: ine
221
+
222
+ - tgt_alpha3: eng
223
+
224
+ - short_pair: ine-en
225
+
226
+ - chrF2_score: 0.615
227
+
228
+ - bleu: 45.6
229
+
230
+ - brevity_penalty: 0.997
231
+
232
+ - ref_len: 71872.0
233
+
234
+ - src_name: Indo-European languages
235
+
236
+ - tgt_name: English
237
+
238
+ - train_date: 2020-08-01
239
+
240
+ - src_alpha2: ine
241
+
242
+ - tgt_alpha2: en
243
+
244
+ - prefer_old: False
245
+
246
+ - long_pair: ine-eng
247
+
248
+ - helsinki_git_sha: 480fcbe0ee1bf4774bcbe6226ad9f58e63f6c535
249
+
250
+ - transformers_git_sha: 46e9f53347bbe9e989f0335f98465f30886d8173
251
+
252
+ - port_machine: brutasse
253
+
254
+ - port_time: 2020-08-18-01:48