tomaarsen HF staff commited on
Commit
69465e4
1 Parent(s): 5aa14b1

Add new SentenceTransformer model.

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,1256 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - multilingual
5
+ - ar
6
+ - bg
7
+ - ca
8
+ - cs
9
+ - da
10
+ - de
11
+ - el
12
+ - es
13
+ - et
14
+ - fa
15
+ - fi
16
+ - fr
17
+ - gl
18
+ - gu
19
+ - he
20
+ - hi
21
+ - hr
22
+ - hu
23
+ - hy
24
+ - id
25
+ - it
26
+ - ja
27
+ - ka
28
+ - ko
29
+ - ku
30
+ - lt
31
+ - lv
32
+ - mk
33
+ - mn
34
+ - mr
35
+ - ms
36
+ - my
37
+ - nb
38
+ - nl
39
+ - pl
40
+ - pt
41
+ - ro
42
+ - ru
43
+ - sk
44
+ - sl
45
+ - sq
46
+ - sr
47
+ - sv
48
+ - th
49
+ - tr
50
+ - uk
51
+ - ur
52
+ - vi
53
+ - zh
54
+ library_name: sentence-transformers
55
+ tags:
56
+ - sentence-transformers
57
+ - sentence-similarity
58
+ - feature-extraction
59
+ - loss:MSELoss
60
+ base_model: FacebookAI/xlm-roberta-base
61
+ metrics:
62
+ - negative_mse
63
+ - src2trg_accuracy
64
+ - trg2src_accuracy
65
+ - mean_accuracy
66
+ - pearson_cosine
67
+ - spearman_cosine
68
+ - pearson_manhattan
69
+ - spearman_manhattan
70
+ - pearson_euclidean
71
+ - spearman_euclidean
72
+ - pearson_dot
73
+ - spearman_dot
74
+ - pearson_max
75
+ - spearman_max
76
+ widget:
77
+ - source_sentence: Grazie tante.
78
+ sentences:
79
+ - Grazie infinite.
80
+ - Non c'è un solo architetto diplomato in tutta la Contea.
81
+ - Le aziende non credevano che fosse loro responsabilità.
82
+ - source_sentence: Avance rapide.
83
+ sentences:
84
+ - Très bien.
85
+ - Donc, je voulais faire quelque chose de spécial aujourd'hui.
86
+ - Et ils ne tiennent pas non plus compte des civils qui souffrent de façon plus
87
+ générale.
88
+ - source_sentence: E' importante.
89
+ sentences:
90
+ - E' una materia fondamentale.
91
+ - Sono qui oggi per mostrare le mie fotografie dei Lakota.
92
+ - Non ero seguito da un corteo di macchine.
93
+ - source_sentence: Müfettişler…
94
+ sentences:
95
+ - İşçi sınıfına dair birşey.
96
+ - Antlaşmaya göre, o topraklar bağımsız bir ulustur.
97
+ - Son derece düz ve bataklık bir coğrafya.
98
+ - source_sentence: Wir sind eins.
99
+ sentences:
100
+ - Das versuchen wir zu bieten.
101
+ - Ihre Gehirne sind ungefähr 100 Millionen Mal komplizierter.
102
+ - Hinter mir war gar keine Autokolonne.
103
+ pipeline_tag: sentence-similarity
104
+ co2_eq_emissions:
105
+ emissions: 23.27766676567869
106
+ energy_consumed: 0.05988563672345058
107
+ source: codecarbon
108
+ training_type: fine-tuning
109
+ on_cloud: false
110
+ cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
111
+ ram_total_size: 31.777088165283203
112
+ hours_used: 0.179
113
+ hardware_used: 1 x NVIDIA GeForce RTX 3090
114
+ model-index:
115
+ - name: SentenceTransformer based on FacebookAI/xlm-roberta-base
116
+ results:
117
+ - task:
118
+ type: knowledge-distillation
119
+ name: Knowledge Distillation
120
+ dataset:
121
+ name: en ar
122
+ type: en-ar
123
+ metrics:
124
+ - type: negative_mse
125
+ value: -20.395545661449432
126
+ name: Negative Mse
127
+ - task:
128
+ type: translation
129
+ name: Translation
130
+ dataset:
131
+ name: en ar
132
+ type: en-ar
133
+ metrics:
134
+ - type: src2trg_accuracy
135
+ value: 0.7603222557905337
136
+ name: Src2Trg Accuracy
137
+ - type: trg2src_accuracy
138
+ value: 0.7824773413897281
139
+ name: Trg2Src Accuracy
140
+ - type: mean_accuracy
141
+ value: 0.7713997985901309
142
+ name: Mean Accuracy
143
+ - task:
144
+ type: semantic-similarity
145
+ name: Semantic Similarity
146
+ dataset:
147
+ name: sts17 en ar test
148
+ type: sts17-en-ar-test
149
+ metrics:
150
+ - type: pearson_cosine
151
+ value: 0.40984231242712876
152
+ name: Pearson Cosine
153
+ - type: spearman_cosine
154
+ value: 0.4425400227662121
155
+ name: Spearman Cosine
156
+ - type: pearson_manhattan
157
+ value: 0.4068582195810505
158
+ name: Pearson Manhattan
159
+ - type: spearman_manhattan
160
+ value: 0.4194184278683204
161
+ name: Spearman Manhattan
162
+ - type: pearson_euclidean
163
+ value: 0.38014538983821944
164
+ name: Pearson Euclidean
165
+ - type: spearman_euclidean
166
+ value: 0.38651157412220366
167
+ name: Spearman Euclidean
168
+ - type: pearson_dot
169
+ value: 0.4077636003696869
170
+ name: Pearson Dot
171
+ - type: spearman_dot
172
+ value: 0.37682818098716137
173
+ name: Spearman Dot
174
+ - type: pearson_max
175
+ value: 0.40984231242712876
176
+ name: Pearson Max
177
+ - type: spearman_max
178
+ value: 0.4425400227662121
179
+ name: Spearman Max
180
+ - task:
181
+ type: knowledge-distillation
182
+ name: Knowledge Distillation
183
+ dataset:
184
+ name: en fr
185
+ type: en-fr
186
+ metrics:
187
+ - type: negative_mse
188
+ value: -19.62321847677231
189
+ name: Negative Mse
190
+ - task:
191
+ type: translation
192
+ name: Translation
193
+ dataset:
194
+ name: en fr
195
+ type: en-fr
196
+ metrics:
197
+ - type: src2trg_accuracy
198
+ value: 0.8981854838709677
199
+ name: Src2Trg Accuracy
200
+ - type: trg2src_accuracy
201
+ value: 0.8901209677419355
202
+ name: Trg2Src Accuracy
203
+ - type: mean_accuracy
204
+ value: 0.8941532258064516
205
+ name: Mean Accuracy
206
+ - task:
207
+ type: semantic-similarity
208
+ name: Semantic Similarity
209
+ dataset:
210
+ name: sts17 fr en test
211
+ type: sts17-fr-en-test
212
+ metrics:
213
+ - type: pearson_cosine
214
+ value: 0.5017606394120642
215
+ name: Pearson Cosine
216
+ - type: spearman_cosine
217
+ value: 0.5333594401322842
218
+ name: Spearman Cosine
219
+ - type: pearson_manhattan
220
+ value: 0.4461108010622129
221
+ name: Pearson Manhattan
222
+ - type: spearman_manhattan
223
+ value: 0.45470883061015244
224
+ name: Spearman Manhattan
225
+ - type: pearson_euclidean
226
+ value: 0.44313058261278737
227
+ name: Pearson Euclidean
228
+ - type: spearman_euclidean
229
+ value: 0.44806261424208443
230
+ name: Spearman Euclidean
231
+ - type: pearson_dot
232
+ value: 0.40165874540768454
233
+ name: Pearson Dot
234
+ - type: spearman_dot
235
+ value: 0.41339619568003433
236
+ name: Spearman Dot
237
+ - type: pearson_max
238
+ value: 0.5017606394120642
239
+ name: Pearson Max
240
+ - type: spearman_max
241
+ value: 0.5333594401322842
242
+ name: Spearman Max
243
+ - task:
244
+ type: knowledge-distillation
245
+ name: Knowledge Distillation
246
+ dataset:
247
+ name: en de
248
+ type: en-de
249
+ metrics:
250
+ - type: negative_mse
251
+ value: -19.727922976017
252
+ name: Negative Mse
253
+ - task:
254
+ type: translation
255
+ name: Translation
256
+ dataset:
257
+ name: en de
258
+ type: en-de
259
+ metrics:
260
+ - type: src2trg_accuracy
261
+ value: 0.8920282542885973
262
+ name: Src2Trg Accuracy
263
+ - type: trg2src_accuracy
264
+ value: 0.8910191725529768
265
+ name: Trg2Src Accuracy
266
+ - type: mean_accuracy
267
+ value: 0.8915237134207871
268
+ name: Mean Accuracy
269
+ - task:
270
+ type: semantic-similarity
271
+ name: Semantic Similarity
272
+ dataset:
273
+ name: sts17 en de test
274
+ type: sts17-en-de-test
275
+ metrics:
276
+ - type: pearson_cosine
277
+ value: 0.5262798164154752
278
+ name: Pearson Cosine
279
+ - type: spearman_cosine
280
+ value: 0.5618005565496922
281
+ name: Spearman Cosine
282
+ - type: pearson_manhattan
283
+ value: 0.5084907192868734
284
+ name: Pearson Manhattan
285
+ - type: spearman_manhattan
286
+ value: 0.5218456102379673
287
+ name: Spearman Manhattan
288
+ - type: pearson_euclidean
289
+ value: 0.5055278909013912
290
+ name: Pearson Euclidean
291
+ - type: spearman_euclidean
292
+ value: 0.5206420646365548
293
+ name: Spearman Euclidean
294
+ - type: pearson_dot
295
+ value: 0.3742195121194434
296
+ name: Pearson Dot
297
+ - type: spearman_dot
298
+ value: 0.3691237073066472
299
+ name: Spearman Dot
300
+ - type: pearson_max
301
+ value: 0.5262798164154752
302
+ name: Pearson Max
303
+ - type: spearman_max
304
+ value: 0.5618005565496922
305
+ name: Spearman Max
306
+ - task:
307
+ type: knowledge-distillation
308
+ name: Knowledge Distillation
309
+ dataset:
310
+ name: en es
311
+ type: en-es
312
+ metrics:
313
+ - type: negative_mse
314
+ value: -19.472387433052063
315
+ name: Negative Mse
316
+ - task:
317
+ type: translation
318
+ name: Translation
319
+ dataset:
320
+ name: en es
321
+ type: en-es
322
+ metrics:
323
+ - type: src2trg_accuracy
324
+ value: 0.9434343434343434
325
+ name: Src2Trg Accuracy
326
+ - type: trg2src_accuracy
327
+ value: 0.9464646464646465
328
+ name: Trg2Src Accuracy
329
+ - type: mean_accuracy
330
+ value: 0.944949494949495
331
+ name: Mean Accuracy
332
+ - task:
333
+ type: semantic-similarity
334
+ name: Semantic Similarity
335
+ dataset:
336
+ name: sts17 es en test
337
+ type: sts17-es-en-test
338
+ metrics:
339
+ - type: pearson_cosine
340
+ value: 0.4944989376773328
341
+ name: Pearson Cosine
342
+ - type: spearman_cosine
343
+ value: 0.502096516024397
344
+ name: Spearman Cosine
345
+ - type: pearson_manhattan
346
+ value: 0.44447965250345656
347
+ name: Pearson Manhattan
348
+ - type: spearman_manhattan
349
+ value: 0.428444032581959
350
+ name: Spearman Manhattan
351
+ - type: pearson_euclidean
352
+ value: 0.43569887867301704
353
+ name: Pearson Euclidean
354
+ - type: spearman_euclidean
355
+ value: 0.4169602915053127
356
+ name: Spearman Euclidean
357
+ - type: pearson_dot
358
+ value: 0.3751122541083453
359
+ name: Pearson Dot
360
+ - type: spearman_dot
361
+ value: 0.37961391381473436
362
+ name: Spearman Dot
363
+ - type: pearson_max
364
+ value: 0.4944989376773328
365
+ name: Pearson Max
366
+ - type: spearman_max
367
+ value: 0.502096516024397
368
+ name: Spearman Max
369
+ - task:
370
+ type: knowledge-distillation
371
+ name: Knowledge Distillation
372
+ dataset:
373
+ name: en tr
374
+ type: en-tr
375
+ metrics:
376
+ - type: negative_mse
377
+ value: -20.754697918891907
378
+ name: Negative Mse
379
+ - task:
380
+ type: translation
381
+ name: Translation
382
+ dataset:
383
+ name: en tr
384
+ type: en-tr
385
+ metrics:
386
+ - type: src2trg_accuracy
387
+ value: 0.743202416918429
388
+ name: Src2Trg Accuracy
389
+ - type: trg2src_accuracy
390
+ value: 0.743202416918429
391
+ name: Trg2Src Accuracy
392
+ - type: mean_accuracy
393
+ value: 0.743202416918429
394
+ name: Mean Accuracy
395
+ - task:
396
+ type: semantic-similarity
397
+ name: Semantic Similarity
398
+ dataset:
399
+ name: sts17 en tr test
400
+ type: sts17-en-tr-test
401
+ metrics:
402
+ - type: pearson_cosine
403
+ value: 0.5544917743538167
404
+ name: Pearson Cosine
405
+ - type: spearman_cosine
406
+ value: 0.581923120433332
407
+ name: Spearman Cosine
408
+ - type: pearson_manhattan
409
+ value: 0.5103770986779784
410
+ name: Pearson Manhattan
411
+ - type: spearman_manhattan
412
+ value: 0.5087986920849596
413
+ name: Spearman Manhattan
414
+ - type: pearson_euclidean
415
+ value: 0.5045523005860614
416
+ name: Pearson Euclidean
417
+ - type: spearman_euclidean
418
+ value: 0.5053157708914061
419
+ name: Spearman Euclidean
420
+ - type: pearson_dot
421
+ value: 0.47262046401401747
422
+ name: Pearson Dot
423
+ - type: spearman_dot
424
+ value: 0.4297595645819756
425
+ name: Spearman Dot
426
+ - type: pearson_max
427
+ value: 0.5544917743538167
428
+ name: Pearson Max
429
+ - type: spearman_max
430
+ value: 0.581923120433332
431
+ name: Spearman Max
432
+ - task:
433
+ type: knowledge-distillation
434
+ name: Knowledge Distillation
435
+ dataset:
436
+ name: en it
437
+ type: en-it
438
+ metrics:
439
+ - type: negative_mse
440
+ value: -19.76993829011917
441
+ name: Negative Mse
442
+ - task:
443
+ type: translation
444
+ name: Translation
445
+ dataset:
446
+ name: en it
447
+ type: en-it
448
+ metrics:
449
+ - type: src2trg_accuracy
450
+ value: 0.878147029204431
451
+ name: Src2Trg Accuracy
452
+ - type: trg2src_accuracy
453
+ value: 0.8831822759315207
454
+ name: Trg2Src Accuracy
455
+ - type: mean_accuracy
456
+ value: 0.8806646525679758
457
+ name: Mean Accuracy
458
+ - task:
459
+ type: semantic-similarity
460
+ name: Semantic Similarity
461
+ dataset:
462
+ name: sts17 it en test
463
+ type: sts17-it-en-test
464
+ metrics:
465
+ - type: pearson_cosine
466
+ value: 0.506365733914274
467
+ name: Pearson Cosine
468
+ - type: spearman_cosine
469
+ value: 0.5250284136808592
470
+ name: Spearman Cosine
471
+ - type: pearson_manhattan
472
+ value: 0.45167598168533407
473
+ name: Pearson Manhattan
474
+ - type: spearman_manhattan
475
+ value: 0.46227952068355316
476
+ name: Spearman Manhattan
477
+ - type: pearson_euclidean
478
+ value: 0.4423426674780287
479
+ name: Pearson Euclidean
480
+ - type: spearman_euclidean
481
+ value: 0.45072801992723094
482
+ name: Spearman Euclidean
483
+ - type: pearson_dot
484
+ value: 0.4201989776020174
485
+ name: Pearson Dot
486
+ - type: spearman_dot
487
+ value: 0.42253906764732746
488
+ name: Spearman Dot
489
+ - type: pearson_max
490
+ value: 0.506365733914274
491
+ name: Pearson Max
492
+ - type: spearman_max
493
+ value: 0.5250284136808592
494
+ name: Spearman Max
495
+ ---
496
+
497
+ # SentenceTransformer based on FacebookAI/xlm-roberta-base
498
+
499
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [FacebookAI/xlm-roberta-base](https://huggingface.co/FacebookAI/xlm-roberta-base) on the [en-ar](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks), [en-fr](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks), [en-de](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks), [en-es](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks), [en-tr](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks) and [en-it](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks) datasets. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
500
+
501
+ ## Model Details
502
+
503
+ ### Model Description
504
+ - **Model Type:** Sentence Transformer
505
+ - **Base model:** [FacebookAI/xlm-roberta-base](https://huggingface.co/FacebookAI/xlm-roberta-base) <!-- at revision e73636d4f797dec63c3081bb6ed5c7b0bb3f2089 -->
506
+ - **Maximum Sequence Length:** 128 tokens
507
+ - **Output Dimensionality:** 768 tokens
508
+ - **Similarity Function:** Cosine Similarity
509
+ - **Training Datasets:**
510
+ - [en-ar](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks)
511
+ - [en-fr](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks)
512
+ - [en-de](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks)
513
+ - [en-es](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks)
514
+ - [en-tr](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks)
515
+ - [en-it](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks)
516
+ - **Languages:** en, multilingual, ar, bg, ca, cs, da, de, el, es, et, fa, fi, fr, gl, gu, he, hi, hr, hu, hy, id, it, ja, ka, ko, ku, lt, lv, mk, mn, mr, ms, my, nb, nl, pl, pt, ro, ru, sk, sl, sq, sr, sv, th, tr, uk, ur, vi, zh
517
+ <!-- - **License:** Unknown -->
518
+
519
+ ### Model Sources
520
+
521
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
522
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
523
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
524
+
525
+ ### Full Model Architecture
526
+
527
+ ```
528
+ SentenceTransformer(
529
+ (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
530
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
531
+ )
532
+ ```
533
+
534
+ ## Usage
535
+
536
+ ### Direct Usage (Sentence Transformers)
537
+
538
+ First install the Sentence Transformers library:
539
+
540
+ ```bash
541
+ pip install -U sentence-transformers
542
+ ```
543
+
544
+ Then you can load this model and run inference.
545
+ ```python
546
+ from sentence_transformers import SentenceTransformer
547
+
548
+ # Download from the 🤗 Hub
549
+ model = SentenceTransformer("tomaarsen/xlm-roberta-base-multilingual-en-ar-fr-de-es-tr-it")
550
+ # Run inference
551
+ sentences = [
552
+ 'Wir sind eins.',
553
+ 'Das versuchen wir zu bieten.',
554
+ 'Ihre Gehirne sind ungefähr 100 Millionen Mal komplizierter.',
555
+ ]
556
+ embeddings = model.encode(sentences)
557
+ print(embeddings.shape)
558
+ # [3, 768]
559
+
560
+ # Get the similarity scores for the embeddings
561
+ similarities = model.similarity(embeddings)
562
+ print(similarities.shape)
563
+ # [3, 3]
564
+ ```
565
+
566
+ <!--
567
+ ### Direct Usage (Transformers)
568
+
569
+ <details><summary>Click to see the direct usage in Transformers</summary>
570
+
571
+ </details>
572
+ -->
573
+
574
+ <!--
575
+ ### Downstream Usage (Sentence Transformers)
576
+
577
+ You can finetune this model on your own dataset.
578
+
579
+ <details><summary>Click to expand</summary>
580
+
581
+ </details>
582
+ -->
583
+
584
+ <!--
585
+ ### Out-of-Scope Use
586
+
587
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
588
+ -->
589
+
590
+ ## Evaluation
591
+
592
+ ### Metrics
593
+
594
+ #### Knowledge Distillation
595
+ * Dataset: `en-ar`
596
+ * Evaluated with [<code>MSEEvaluator</code>](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.MSEEvaluator)
597
+
598
+ | Metric | Value |
599
+ |:-----------------|:-------------|
600
+ | **negative_mse** | **-20.3955** |
601
+
602
+ #### Translation
603
+ * Dataset: `en-ar`
604
+ * Evaluated with [<code>TranslationEvaluator</code>](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.TranslationEvaluator)
605
+
606
+ | Metric | Value |
607
+ |:------------------|:-----------|
608
+ | src2trg_accuracy | 0.7603 |
609
+ | trg2src_accuracy | 0.7825 |
610
+ | **mean_accuracy** | **0.7714** |
611
+
612
+ #### Semantic Similarity
613
+ * Dataset: `sts17-en-ar-test`
614
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
615
+
616
+ | Metric | Value |
617
+ |:-------------------|:-----------|
618
+ | pearson_cosine | 0.4098 |
619
+ | spearman_cosine | 0.4425 |
620
+ | pearson_manhattan | 0.4069 |
621
+ | spearman_manhattan | 0.4194 |
622
+ | pearson_euclidean | 0.3801 |
623
+ | spearman_euclidean | 0.3865 |
624
+ | pearson_dot | 0.4078 |
625
+ | spearman_dot | 0.3768 |
626
+ | pearson_max | 0.4098 |
627
+ | **spearman_max** | **0.4425** |
628
+
629
+ #### Knowledge Distillation
630
+ * Dataset: `en-fr`
631
+ * Evaluated with [<code>MSEEvaluator</code>](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.MSEEvaluator)
632
+
633
+ | Metric | Value |
634
+ |:-----------------|:-------------|
635
+ | **negative_mse** | **-19.6232** |
636
+
637
+ #### Translation
638
+ * Dataset: `en-fr`
639
+ * Evaluated with [<code>TranslationEvaluator</code>](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.TranslationEvaluator)
640
+
641
+ | Metric | Value |
642
+ |:------------------|:-----------|
643
+ | src2trg_accuracy | 0.8982 |
644
+ | trg2src_accuracy | 0.8901 |
645
+ | **mean_accuracy** | **0.8942** |
646
+
647
+ #### Semantic Similarity
648
+ * Dataset: `sts17-fr-en-test`
649
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
650
+
651
+ | Metric | Value |
652
+ |:-------------------|:-----------|
653
+ | pearson_cosine | 0.5018 |
654
+ | spearman_cosine | 0.5334 |
655
+ | pearson_manhattan | 0.4461 |
656
+ | spearman_manhattan | 0.4547 |
657
+ | pearson_euclidean | 0.4431 |
658
+ | spearman_euclidean | 0.4481 |
659
+ | pearson_dot | 0.4017 |
660
+ | spearman_dot | 0.4134 |
661
+ | pearson_max | 0.5018 |
662
+ | **spearman_max** | **0.5334** |
663
+
664
+ #### Knowledge Distillation
665
+ * Dataset: `en-de`
666
+ * Evaluated with [<code>MSEEvaluator</code>](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.MSEEvaluator)
667
+
668
+ | Metric | Value |
669
+ |:-----------------|:-------------|
670
+ | **negative_mse** | **-19.7279** |
671
+
672
+ #### Translation
673
+ * Dataset: `en-de`
674
+ * Evaluated with [<code>TranslationEvaluator</code>](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.TranslationEvaluator)
675
+
676
+ | Metric | Value |
677
+ |:------------------|:-----------|
678
+ | src2trg_accuracy | 0.892 |
679
+ | trg2src_accuracy | 0.891 |
680
+ | **mean_accuracy** | **0.8915** |
681
+
682
+ #### Semantic Similarity
683
+ * Dataset: `sts17-en-de-test`
684
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
685
+
686
+ | Metric | Value |
687
+ |:-------------------|:-----------|
688
+ | pearson_cosine | 0.5263 |
689
+ | spearman_cosine | 0.5618 |
690
+ | pearson_manhattan | 0.5085 |
691
+ | spearman_manhattan | 0.5218 |
692
+ | pearson_euclidean | 0.5055 |
693
+ | spearman_euclidean | 0.5206 |
694
+ | pearson_dot | 0.3742 |
695
+ | spearman_dot | 0.3691 |
696
+ | pearson_max | 0.5263 |
697
+ | **spearman_max** | **0.5618** |
698
+
699
+ #### Knowledge Distillation
700
+ * Dataset: `en-es`
701
+ * Evaluated with [<code>MSEEvaluator</code>](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.MSEEvaluator)
702
+
703
+ | Metric | Value |
704
+ |:-----------------|:-------------|
705
+ | **negative_mse** | **-19.4724** |
706
+
707
+ #### Translation
708
+ * Dataset: `en-es`
709
+ * Evaluated with [<code>TranslationEvaluator</code>](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.TranslationEvaluator)
710
+
711
+ | Metric | Value |
712
+ |:------------------|:-----------|
713
+ | src2trg_accuracy | 0.9434 |
714
+ | trg2src_accuracy | 0.9465 |
715
+ | **mean_accuracy** | **0.9449** |
716
+
717
+ #### Semantic Similarity
718
+ * Dataset: `sts17-es-en-test`
719
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
720
+
721
+ | Metric | Value |
722
+ |:-------------------|:-----------|
723
+ | pearson_cosine | 0.4945 |
724
+ | spearman_cosine | 0.5021 |
725
+ | pearson_manhattan | 0.4445 |
726
+ | spearman_manhattan | 0.4284 |
727
+ | pearson_euclidean | 0.4357 |
728
+ | spearman_euclidean | 0.417 |
729
+ | pearson_dot | 0.3751 |
730
+ | spearman_dot | 0.3796 |
731
+ | pearson_max | 0.4945 |
732
+ | **spearman_max** | **0.5021** |
733
+
734
+ #### Knowledge Distillation
735
+ * Dataset: `en-tr`
736
+ * Evaluated with [<code>MSEEvaluator</code>](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.MSEEvaluator)
737
+
738
+ | Metric | Value |
739
+ |:-----------------|:-------------|
740
+ | **negative_mse** | **-20.7547** |
741
+
742
+ #### Translation
743
+ * Dataset: `en-tr`
744
+ * Evaluated with [<code>TranslationEvaluator</code>](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.TranslationEvaluator)
745
+
746
+ | Metric | Value |
747
+ |:------------------|:-----------|
748
+ | src2trg_accuracy | 0.7432 |
749
+ | trg2src_accuracy | 0.7432 |
750
+ | **mean_accuracy** | **0.7432** |
751
+
752
+ #### Semantic Similarity
753
+ * Dataset: `sts17-en-tr-test`
754
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
755
+
756
+ | Metric | Value |
757
+ |:-------------------|:-----------|
758
+ | pearson_cosine | 0.5545 |
759
+ | spearman_cosine | 0.5819 |
760
+ | pearson_manhattan | 0.5104 |
761
+ | spearman_manhattan | 0.5088 |
762
+ | pearson_euclidean | 0.5046 |
763
+ | spearman_euclidean | 0.5053 |
764
+ | pearson_dot | 0.4726 |
765
+ | spearman_dot | 0.4298 |
766
+ | pearson_max | 0.5545 |
767
+ | **spearman_max** | **0.5819** |
768
+
769
+ #### Knowledge Distillation
770
+ * Dataset: `en-it`
771
+ * Evaluated with [<code>MSEEvaluator</code>](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.MSEEvaluator)
772
+
773
+ | Metric | Value |
774
+ |:-----------------|:-------------|
775
+ | **negative_mse** | **-19.7699** |
776
+
777
+ #### Translation
778
+ * Dataset: `en-it`
779
+ * Evaluated with [<code>TranslationEvaluator</code>](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.TranslationEvaluator)
780
+
781
+ | Metric | Value |
782
+ |:------------------|:-----------|
783
+ | src2trg_accuracy | 0.8781 |
784
+ | trg2src_accuracy | 0.8832 |
785
+ | **mean_accuracy** | **0.8807** |
786
+
787
+ #### Semantic Similarity
788
+ * Dataset: `sts17-it-en-test`
789
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
790
+
791
+ | Metric | Value |
792
+ |:-------------------|:----------|
793
+ | pearson_cosine | 0.5064 |
794
+ | spearman_cosine | 0.525 |
795
+ | pearson_manhattan | 0.4517 |
796
+ | spearman_manhattan | 0.4623 |
797
+ | pearson_euclidean | 0.4423 |
798
+ | spearman_euclidean | 0.4507 |
799
+ | pearson_dot | 0.4202 |
800
+ | spearman_dot | 0.4225 |
801
+ | pearson_max | 0.5064 |
802
+ | **spearman_max** | **0.525** |
803
+
804
+ <!--
805
+ ## Bias, Risks and Limitations
806
+
807
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
808
+ -->
809
+
810
+ <!--
811
+ ### Recommendations
812
+
813
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
814
+ -->
815
+
816
+ ## Training Details
817
+
818
+ ### Training Datasets
819
+
820
+ #### en-ar
821
+
822
+ * Dataset: [en-ar](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks) at [d366ddd](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks/tree/d366dddc3d1ef0421a41f9e534bad4efae6d7730)
823
+ * Size: 5,000 training samples
824
+ * Columns: <code>non_english</code> and <code>label</code>
825
+ * Approximate statistics based on the first 1000 samples:
826
+ | | non_english | label |
827
+ |:--------|:----------------------------------------------------------------------------------|:-------------------------------------|
828
+ | type | string | list |
829
+ | details | <ul><li>min: 4 tokens</li><li>mean: 27.3 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>size: 768 elements</li></ul> |
830
+ * Samples:
831
+ | non_english | label |
832
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------|
833
+ | <code>حسناً ان ما نقوم به اليوم .. هو ان نجبر الطلاب لتعلم الرياضيات</code> | <code>[0.3943225145339966, 0.18910610675811768, -0.3788299858570099, 0.4386662542819977, 0.2727023661136627, ...]</code> |
834
+ | <code>انها المادة الاهم ..</code> | <code>[0.6257511377334595, -0.1750679910182953, -0.5734405517578125, 0.11480475962162018, 1.1682192087173462, ...]</code> |
835
+ | <code>انا لا انفي لدقيقة واحدة ان الذين يهتمون بالحسابات اليدوية والذين هوايتهم القيام بذلك .. او القيام بالطرق التقليدية في اي مجال ان يقوموا بذلك كما يريدون .</code> | <code>[-0.04564047232270241, 0.4971524775028229, 0.28066301345825195, -0.726702094078064, -0.17846377193927765, ...]</code> |
836
+ * Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/losses.html#mseloss)
837
+
838
+ #### en-fr
839
+
840
+ * Dataset: [en-fr](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks) at [d366ddd](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks/tree/d366dddc3d1ef0421a41f9e534bad4efae6d7730)
841
+ * Size: 5,000 training samples
842
+ * Columns: <code>non_english</code> and <code>label</code>
843
+ * Approximate statistics based on the first 1000 samples:
844
+ | | non_english | label |
845
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------|
846
+ | type | string | list |
847
+ | details | <ul><li>min: 3 tokens</li><li>mean: 30.18 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>size: 768 elements</li></ul> |
848
+ * Samples:
849
+ | non_english | label |
850
+ |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------|
851
+ | <code>Je ne crois pas que ce soit justifié.</code> | <code>[-0.361753910779953, 0.7323777079582214, 0.6518164277076721, -0.8461216688156128, -0.007496988866478205, ...]</code> |
852
+ | <code>Je fais cette distinction entre ce qu'on force les gens à faire et les matières générales, et la matière que quelqu'un va apprendre parce que ça lui plait et peut-être même exceller dans ce domaine.</code> | <code>[0.3047865629196167, 0.5270194411277771, 0.26616284251213074, 0.2612147927284241, 0.1950961947441101, ...]</code> |
853
+ | <code>Quels sont les problèmes en relation avec ça?</code> | <code>[0.2123892903327942, -0.09616081416606903, -0.41965243220329285, -0.5469444394111633, -0.6056491136550903, ...]</code> |
854
+ * Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/losses.html#mseloss)
855
+
856
+ #### en-de
857
+
858
+ * Dataset: [en-de](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks) at [d366ddd](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks/tree/d366dddc3d1ef0421a41f9e534bad4efae6d7730)
859
+ * Size: 5,000 training samples
860
+ * Columns: <code>non_english</code> and <code>label</code>
861
+ * Approximate statistics based on the first 1000 samples:
862
+ | | non_english | label |
863
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------|
864
+ | type | string | list |
865
+ | details | <ul><li>min: 4 tokens</li><li>mean: 27.04 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>size: 768 elements</li></ul> |
866
+ * Samples:
867
+ | non_english | label |
868
+ |:----------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------|
869
+ | <code>Ich denke, dass es sich aus diesem Grund lohnt, den Leuten das Rechnen von Hand beizubringen.</code> | <code>[0.0960279330611229, 0.7833179831504822, -0.09527698159217834, 0.8104371428489685, 0.7545774579048157, ...]</code> |
870
+ | <code>Außerdem gibt es ein paar bestimmte konzeptionelle Dinge, die das Rechnen per Hand rechtfertigen, aber ich glaube es sind sehr wenige.</code> | <code>[-0.5939837098121643, 0.9714100956916809, 0.6800686717033386, -0.21585524082183838, -0.7509503364562988, ...]</code> |
871
+ | <code>Eine Sache, die ich mich oft frage, ist Altgriechisch, und wie das zusammengehört.</code> | <code>[-0.09777048230171204, 0.07093209028244019, -0.42989012598991394, -0.1457514613866806, 1.4382753372192383, ...]</code> |
872
+ * Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/losses.html#mseloss)
873
+
874
+ #### en-es
875
+
876
+ * Dataset: [en-es](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks) at [d366ddd](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks/tree/d366dddc3d1ef0421a41f9e534bad4efae6d7730)
877
+ * Size: 5,000 training samples
878
+ * Columns: <code>non_english</code> and <code>label</code>
879
+ * Approximate statistics based on the first 1000 samples:
880
+ | | non_english | label |
881
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------|
882
+ | type | string | list |
883
+ | details | <ul><li>min: 4 tokens</li><li>mean: 25.42 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>size: 768 elements</li></ul> |
884
+ * Samples:
885
+ | non_english | label |
886
+ |:-----------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------|
887
+ | <code>Y luego hay ciertas aspectos conceptuales que pueden beneficiarse del cálculo a mano pero creo que son relativamente pocos.</code> | <code>[-0.5939835906028748, 0.9714106917381287, 0.6800685524940491, -0.2158554196357727, -0.7509507536888123, ...]</code> |
888
+ | <code>Algo que pregunto a menudo es sobre el griego antiguo y cómo se relaciona.</code> | <code>[-0.09777048230171204, 0.07093209028244019, -0.42989012598991394, -0.1457514613866806, 1.4382753372192383, ...]</code> |
889
+ | <code>Vean, lo que estamos haciendo ahora es forzar a la gente a aprender matemáticas.</code> | <code>[0.3943225145339966, 0.18910610675811768, -0.3788299858570099, 0.4386662542819977, 0.2727023661136627, ...]</code> |
890
+ * Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/losses.html#mseloss)
891
+
892
+ #### en-tr
893
+
894
+ * Dataset: [en-tr](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks) at [d366ddd](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks/tree/d366dddc3d1ef0421a41f9e534bad4efae6d7730)
895
+ * Size: 5,000 training samples
896
+ * Columns: <code>non_english</code> and <code>label</code>
897
+ * Approximate statistics based on the first 1000 samples:
898
+ | | non_english | label |
899
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------|
900
+ | type | string | list |
901
+ | details | <ul><li>min: 4 tokens</li><li>mean: 24.72 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>size: 768 elements</li></ul> |
902
+ * Samples:
903
+ | non_english | label |
904
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------|
905
+ | <code>Eğer insanlar elle hesaba ilgililerse ya da öğrenmek için özel amaçları varsa konu ne kadar acayip olursa olsun bunu öğrenmeliler, engellemeyi bir an için bile önermiyorum.</code> | <code>[-0.04564047232270241, 0.4971524775028229, 0.28066301345825195, -0.726702094078064, -0.17846377193927765, ...]</code> |
906
+ | <code>İnsanların kendi ilgi alanlarını takip etmeleri, kesinlikle doğru bir şeydir.</code> | <code>[0.2061387449502945, 0.5284574031829834, 0.3577779233455658, 0.28818392753601074, 0.17228049039840698, ...]</code> |
907
+ | <code>Ben bir biçimde Antik Yunan hakkında ilgiliyimdir. ancak tüm nüfusu Antik Yunan gibi bir konu hakkında bilgi edinmeye zorlamamalıyız.</code> | <code>[0.12050342559814453, 0.15652479231357574, 0.48636534810066223, -0.13693244755268097, 0.42764803767204285, ...]</code> |
908
+ * Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/losses.html#mseloss)
909
+
910
+ #### en-it
911
+
912
+ * Dataset: [en-it](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks) at [d366ddd](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks/tree/d366dddc3d1ef0421a41f9e534bad4efae6d7730)
913
+ * Size: 5,000 training samples
914
+ * Columns: <code>non_english</code> and <code>label</code>
915
+ * Approximate statistics based on the first 1000 samples:
916
+ | | non_english | label |
917
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------|
918
+ | type | string | list |
919
+ | details | <ul><li>min: 3 tokens</li><li>mean: 26.41 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>size: 768 elements</li></ul> |
920
+ * Samples:
921
+ | non_english | label |
922
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------|
923
+ | <code>Non credo che sia giustificato.</code> | <code>[-0.36175352334976196, 0.7323781251907349, 0.651816189289093, -0.8461223840713501, -0.007496151141822338, ...]</code> |
924
+ | <code>Perciò faccio distinzione tra quello che stiamo facendo fare alle persone, le materie che si ritengono principali, e le materie che le persone potrebbero seguire per loro interesse o forse a volte anche incitate a farlo.</code> | <code>[0.3047865927219391, 0.5270194411277771, 0.26616284251213074, 0.2612147927284241, 0.1950961947441101, ...]</code> |
925
+ | <code>Ma che argomenti porta la gente su questi temi?</code> | <code>[0.2123885154724121, -0.09616123884916306, -0.4196523427963257, -0.5469440817832947, -0.6056501865386963, ...]</code> |
926
+ * Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/losses.html#mseloss)
927
+
928
+ ### Evaluation Datasets
929
+
930
+ #### en-ar
931
+
932
+ * Dataset: [en-ar](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks) at [d366ddd](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks/tree/d366dddc3d1ef0421a41f9e534bad4efae6d7730)
933
+ * Size: 993 evaluation samples
934
+ * Columns: <code>non_english</code> and <code>label</code>
935
+ * Approximate statistics based on the first 1000 samples:
936
+ | | non_english | label |
937
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------|
938
+ | type | string | list |
939
+ | details | <ul><li>min: 3 tokens</li><li>mean: 28.03 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>size: 768 elements</li></ul> |
940
+ * Samples:
941
+ | non_english | label |
942
+ |:------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------|
943
+ | <code>شكرا جزيلا كريس.</code> | <code>[-0.4331263303756714, 1.0602688789367676, -0.07791043072938919, -0.4170420169830322, 1.6768444776535034, ...]</code> |
944
+ | <code>انه فعلا شرف عظيم لي ان أصعد المنصة للمرة الثانية. أنا في غاية الامتنان.</code> | <code>[0.27005696296691895, 0.5391750335693359, -0.2580486238002777, -0.6613674759864807, 0.6738830804824829, ...]</code> |
945
+ | <code>لقد بهرت فعلا بهذا المؤتمر, وأريد أن أشكركم جميعا على تعليقاتكم الطيبة على ما قلته تلك الليلة.</code> | <code>[-0.25320106744766235, 0.04791366308927536, -0.13174884021282196, -0.7357578277587891, 0.2366354614496231, ...]</code> |
946
+ * Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/losses.html#mseloss)
947
+
948
+ #### en-fr
949
+
950
+ * Dataset: [en-fr](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks) at [d366ddd](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks/tree/d366dddc3d1ef0421a41f9e534bad4efae6d7730)
951
+ * Size: 992 evaluation samples
952
+ * Columns: <code>non_english</code> and <code>label</code>
953
+ * Approximate statistics based on the first 1000 samples:
954
+ | | non_english | label |
955
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------|
956
+ | type | string | list |
957
+ | details | <ul><li>min: 4 tokens</li><li>mean: 30.72 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>size: 768 elements</li></ul> |
958
+ * Samples:
959
+ | non_english | label |
960
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------|
961
+ | <code>Merci beaucoup, Chris.</code> | <code>[-0.4331263303756714, 1.0602688789367676, -0.07791043072938919, -0.4170420169830322, 1.6768444776535034, ...]</code> |
962
+ | <code>C'est vraiment un honneur de pouvoir venir sur cette scène une deuxième fois. Je suis très reconnaissant.</code> | <code>[0.27005696296691895, 0.5391750335693359, -0.2580486238002777, -0.6613674759864807, 0.6738830804824829, ...]</code> |
963
+ | <code>J'ai été très impressionné par cette conférence, et je tiens à vous remercier tous pour vos nombreux et sympathiques commentaires sur ce que j'ai dit l'autre soir.</code> | <code>[-0.25320106744766235, 0.04791366308927536, -0.13174884021282196, -0.7357578277587891, 0.2366354614496231, ...]</code> |
964
+ * Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/losses.html#mseloss)
965
+
966
+ #### en-de
967
+
968
+ * Dataset: [en-de](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks) at [d366ddd](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks/tree/d366dddc3d1ef0421a41f9e534bad4efae6d7730)
969
+ * Size: 991 evaluation samples
970
+ * Columns: <code>non_english</code> and <code>label</code>
971
+ * Approximate statistics based on the first 1000 samples:
972
+ | | non_english | label |
973
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------|
974
+ | type | string | list |
975
+ | details | <ul><li>min: 4 tokens</li><li>mean: 27.71 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>size: 768 elements</li></ul> |
976
+ * Samples:
977
+ | non_english | label |
978
+ |:-----------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------|
979
+ | <code>Vielen Dank, Chris.</code> | <code>[-0.4331263303756714, 1.0602688789367676, -0.07791043072938919, -0.4170420169830322, 1.6768444776535034, ...]</code> |
980
+ | <code>Es ist mir wirklich eine Ehre, zweimal auf dieser Bühne stehen zu dürfen. Tausend Dank dafür.</code> | <code>[0.27005696296691895, 0.5391750335693359, -0.2580486238002777, -0.6613674759864807, 0.6738830804824829, ...]</code> |
981
+ | <code>Ich bin wirklich begeistert von dieser Konferenz, und ich danke Ihnen allen für die vielen netten Kommentare zu meiner Rede vorgestern Abend.</code> | <code>[-0.25320106744766235, 0.04791366308927536, -0.13174884021282196, -0.7357578277587891, 0.2366354614496231, ...]</code> |
982
+ * Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/losses.html#mseloss)
983
+
984
+ #### en-es
985
+
986
+ * Dataset: [en-es](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks) at [d366ddd](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks/tree/d366dddc3d1ef0421a41f9e534bad4efae6d7730)
987
+ * Size: 990 evaluation samples
988
+ * Columns: <code>non_english</code> and <code>label</code>
989
+ * Approximate statistics based on the first 1000 samples:
990
+ | | non_english | label |
991
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------|
992
+ | type | string | list |
993
+ | details | <ul><li>min: 4 tokens</li><li>mean: 26.47 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>size: 768 elements</li></ul> |
994
+ * Samples:
995
+ | non_english | label |
996
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------|
997
+ | <code>Muchas gracias Chris.</code> | <code>[-0.4331263303756714, 1.0602688789367676, -0.07791043072938919, -0.4170420169830322, 1.6768444776535034, ...]</code> |
998
+ | <code>Y es en verdad un gran honor tener la oportunidad de venir a este escenario por segunda vez. Estoy extremadamente agradecido.</code> | <code>[0.27005696296691895, 0.5391750335693359, -0.2580486238002777, -0.6613674759864807, 0.6738830804824829, ...]</code> |
999
+ | <code>He quedado conmovido por esta conferencia, y deseo agradecer a todos ustedes sus amables comentarios acerca de lo que tenía que decir la otra noche.</code> | <code>[-0.25320106744766235, 0.04791366308927536, -0.13174884021282196, -0.7357578277587891, 0.2366354614496231, ...]</code> |
1000
+ * Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/losses.html#mseloss)
1001
+
1002
+ #### en-tr
1003
+
1004
+ * Dataset: [en-tr](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks) at [d366ddd](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks/tree/d366dddc3d1ef0421a41f9e534bad4efae6d7730)
1005
+ * Size: 993 evaluation samples
1006
+ * Columns: <code>non_english</code> and <code>label</code>
1007
+ * Approximate statistics based on the first 1000 samples:
1008
+ | | non_english | label |
1009
+ |:--------|:----------------------------------------------------------------------------------|:-------------------------------------|
1010
+ | type | string | list |
1011
+ | details | <ul><li>min: 4 tokens</li><li>mean: 25.4 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>size: 768 elements</li></ul> |
1012
+ * Samples:
1013
+ | non_english | label |
1014
+ |:----------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------|
1015
+ | <code>Çok teşekkür ederim Chris.</code> | <code>[-0.4331263303756714, 1.0602688789367676, -0.07791043072938919, -0.4170420169830322, 1.6768444776535034, ...]</code> |
1016
+ | <code>Bu sahnede ikinci kez yer alma fırsatına sahip olmak gerçekten büyük bir onur. Çok minnettarım.</code> | <code>[0.27005696296691895, 0.5391750335693359, -0.2580486238002777, -0.6613674759864807, 0.6738830804824829, ...]</code> |
1017
+ | <code>Bu konferansta çok mutlu oldum, ve anlattıklarımla ilgili güzel yorumlarınız için sizlere çok teşekkür ederim.</code> | <code>[-0.25320106744766235, 0.04791366308927536, -0.13174884021282196, -0.7357578277587891, 0.2366354614496231, ...]</code> |
1018
+ * Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/losses.html#mseloss)
1019
+
1020
+ #### en-it
1021
+
1022
+ * Dataset: [en-it](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks) at [d366ddd](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks/tree/d366dddc3d1ef0421a41f9e534bad4efae6d7730)
1023
+ * Size: 993 evaluation samples
1024
+ * Columns: <code>non_english</code> and <code>label</code>
1025
+ * Approximate statistics based on the first 1000 samples:
1026
+ | | non_english | label |
1027
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------|
1028
+ | type | string | list |
1029
+ | details | <ul><li>min: 4 tokens</li><li>mean: 27.94 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>size: 768 elements</li></ul> |
1030
+ * Samples:
1031
+ | non_english | label |
1032
+ |:--------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------|
1033
+ | <code>Grazie mille, Chris.</code> | <code>[-0.4331263303756714, 1.0602688789367676, -0.07791043072938919, -0.4170420169830322, 1.6768444776535034, ...]</code> |
1034
+ | <code>E’ veramente un grande onore venire su questo palco due volte. Vi sono estremamente grato.</code> | <code>[0.27005696296691895, 0.5391750335693359, -0.2580486238002777, -0.6613674759864807, 0.6738830804824829, ...]</code> |
1035
+ | <code>Sono impressionato da questa conferenza, e voglio ringraziare tutti voi per i tanti, lusinghieri commenti, anche perché... Ne ho bisogno!!</code> | <code>[-0.25320106744766235, 0.04791366308927536, -0.13174884021282196, -0.7357578277587891, 0.2366354614496231, ...]</code> |
1036
+ * Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/losses.html#mseloss)
1037
+
1038
+ ### Training Hyperparameters
1039
+ #### Non-Default Hyperparameters
1040
+
1041
+ - `eval_strategy`: steps
1042
+ - `per_device_train_batch_size`: 64
1043
+ - `per_device_eval_batch_size`: 64
1044
+ - `learning_rate`: 2e-05
1045
+ - `num_train_epochs`: 5
1046
+ - `warmup_ratio`: 0.1
1047
+ - `fp16`: True
1048
+
1049
+ #### All Hyperparameters
1050
+ <details><summary>Click to expand</summary>
1051
+
1052
+ - `overwrite_output_dir`: False
1053
+ - `do_predict`: False
1054
+ - `eval_strategy`: steps
1055
+ - `prediction_loss_only`: False
1056
+ - `per_device_train_batch_size`: 64
1057
+ - `per_device_eval_batch_size`: 64
1058
+ - `per_gpu_train_batch_size`: None
1059
+ - `per_gpu_eval_batch_size`: None
1060
+ - `gradient_accumulation_steps`: 1
1061
+ - `eval_accumulation_steps`: None
1062
+ - `learning_rate`: 2e-05
1063
+ - `weight_decay`: 0.0
1064
+ - `adam_beta1`: 0.9
1065
+ - `adam_beta2`: 0.999
1066
+ - `adam_epsilon`: 1e-08
1067
+ - `max_grad_norm`: 1.0
1068
+ - `num_train_epochs`: 5
1069
+ - `max_steps`: -1
1070
+ - `lr_scheduler_type`: linear
1071
+ - `lr_scheduler_kwargs`: {}
1072
+ - `warmup_ratio`: 0.1
1073
+ - `warmup_steps`: 0
1074
+ - `log_level`: passive
1075
+ - `log_level_replica`: warning
1076
+ - `log_on_each_node`: True
1077
+ - `logging_nan_inf_filter`: True
1078
+ - `save_safetensors`: True
1079
+ - `save_on_each_node`: False
1080
+ - `save_only_model`: False
1081
+ - `no_cuda`: False
1082
+ - `use_cpu`: False
1083
+ - `use_mps_device`: False
1084
+ - `seed`: 42
1085
+ - `data_seed`: None
1086
+ - `jit_mode_eval`: False
1087
+ - `use_ipex`: False
1088
+ - `bf16`: False
1089
+ - `fp16`: True
1090
+ - `fp16_opt_level`: O1
1091
+ - `half_precision_backend`: auto
1092
+ - `bf16_full_eval`: False
1093
+ - `fp16_full_eval`: False
1094
+ - `tf32`: None
1095
+ - `local_rank`: 0
1096
+ - `ddp_backend`: None
1097
+ - `tpu_num_cores`: None
1098
+ - `tpu_metrics_debug`: False
1099
+ - `debug`: []
1100
+ - `dataloader_drop_last`: False
1101
+ - `dataloader_num_workers`: 0
1102
+ - `dataloader_prefetch_factor`: None
1103
+ - `past_index`: -1
1104
+ - `disable_tqdm`: False
1105
+ - `remove_unused_columns`: True
1106
+ - `label_names`: None
1107
+ - `load_best_model_at_end`: False
1108
+ - `ignore_data_skip`: False
1109
+ - `fsdp`: []
1110
+ - `fsdp_min_num_params`: 0
1111
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
1112
+ - `fsdp_transformer_layer_cls_to_wrap`: None
1113
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
1114
+ - `deepspeed`: None
1115
+ - `label_smoothing_factor`: 0.0
1116
+ - `optim`: adamw_torch
1117
+ - `optim_args`: None
1118
+ - `adafactor`: False
1119
+ - `group_by_length`: False
1120
+ - `length_column_name`: length
1121
+ - `ddp_find_unused_parameters`: None
1122
+ - `ddp_bucket_cap_mb`: None
1123
+ - `ddp_broadcast_buffers`: None
1124
+ - `dataloader_pin_memory`: True
1125
+ - `dataloader_persistent_workers`: False
1126
+ - `skip_memory_metrics`: True
1127
+ - `use_legacy_prediction_loop`: False
1128
+ - `push_to_hub`: False
1129
+ - `resume_from_checkpoint`: None
1130
+ - `hub_model_id`: None
1131
+ - `hub_strategy`: every_save
1132
+ - `hub_private_repo`: False
1133
+ - `hub_always_push`: False
1134
+ - `gradient_checkpointing`: False
1135
+ - `gradient_checkpointing_kwargs`: None
1136
+ - `include_inputs_for_metrics`: False
1137
+ - `eval_do_concat_batches`: True
1138
+ - `fp16_backend`: auto
1139
+ - `push_to_hub_model_id`: None
1140
+ - `push_to_hub_organization`: None
1141
+ - `mp_parameters`:
1142
+ - `auto_find_batch_size`: False
1143
+ - `full_determinism`: False
1144
+ - `torchdynamo`: None
1145
+ - `ray_scope`: last
1146
+ - `ddp_timeout`: 1800
1147
+ - `torch_compile`: False
1148
+ - `torch_compile_backend`: None
1149
+ - `torch_compile_mode`: None
1150
+ - `dispatch_batches`: None
1151
+ - `split_batches`: None
1152
+ - `include_tokens_per_second`: False
1153
+ - `include_num_input_tokens_seen`: False
1154
+ - `neftune_noise_alpha`: None
1155
+ - `optim_target_modules`: None
1156
+ - `batch_sampler`: batch_sampler
1157
+ - `multi_dataset_batch_sampler`: proportional
1158
+
1159
+ </details>
1160
+
1161
+ ### Training Logs
1162
+ | Epoch | Step | Training Loss | en-ar loss | en-it loss | en-de loss | en-fr loss | en-es loss | en-tr loss | en-ar_mean_accuracy | en-ar_negative_mse | en-de_mean_accuracy | en-de_negative_mse | en-es_mean_accuracy | en-es_negative_mse | en-fr_mean_accuracy | en-fr_negative_mse | en-it_mean_accuracy | en-it_negative_mse | en-tr_mean_accuracy | en-tr_negative_mse | sts17-en-ar-test_spearman_max | sts17-en-de-test_spearman_max | sts17-en-tr-test_spearman_max | sts17-es-en-test_spearman_max | sts17-fr-en-test_spearman_max | sts17-it-en-test_spearman_max |
1163
+ |:------:|:----:|:-------------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:-------------------:|:------------------:|:-------------------:|:------------------:|:-------------------:|:------------------:|:-------------------:|:------------------:|:-------------------:|:------------------:|:-------------------:|:------------------:|:-----------------------------:|:-----------------------------:|:-----------------------------:|:-----------------------------:|:-----------------------------:|:-----------------------------:|
1164
+ | 0.2110 | 100 | 0.5581 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1165
+ | 0.4219 | 200 | 0.3071 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1166
+ | 0.6329 | 300 | 0.2675 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1167
+ | 0.8439 | 400 | 0.2606 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1168
+ | 1.0549 | 500 | 0.2589 | 0.2519 | 0.2498 | 0.2511 | 0.2488 | 0.2503 | 0.2512 | 0.1254 | -25.1903 | 0.2523 | -25.1089 | 0.2591 | -25.0276 | 0.2409 | -24.8803 | 0.2180 | -24.9768 | 0.1158 | -25.1219 | 0.0308 | 0.1281 | 0.1610 | 0.1465 | 0.0552 | 0.0518 |
1169
+ | 1.2658 | 600 | 0.2504 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1170
+ | 1.4768 | 700 | 0.2427 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1171
+ | 1.6878 | 800 | 0.2337 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1172
+ | 1.8987 | 900 | 0.2246 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1173
+ | 2.1097 | 1000 | 0.2197 | 0.2202 | 0.2157 | 0.2151 | 0.2147 | 0.2139 | 0.2218 | 0.5841 | -22.0204 | 0.8012 | -21.5087 | 0.8495 | -21.3935 | 0.7959 | -21.4660 | 0.7815 | -21.5699 | 0.6007 | -22.1778 | 0.3346 | 0.4013 | 0.4727 | 0.3353 | 0.3827 | 0.3292 |
1174
+ | 2.3207 | 1100 | 0.2163 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1175
+ | 2.5316 | 1200 | 0.2123 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1176
+ | 2.7426 | 1300 | 0.2069 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1177
+ | 2.9536 | 1400 | 0.2048 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1178
+ | 3.1646 | 1500 | 0.2009 | 0.2086 | 0.2029 | 0.2022 | 0.2012 | 0.2002 | 0.2111 | 0.7367 | -20.8567 | 0.8739 | -20.2247 | 0.9303 | -20.0215 | 0.8755 | -20.1213 | 0.8600 | -20.2900 | 0.7165 | -21.1119 | 0.4087 | 0.5473 | 0.5551 | 0.4724 | 0.4882 | 0.4690 |
1179
+ | 3.3755 | 1600 | 0.2019 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1180
+ | 3.5865 | 1700 | 0.1989 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1181
+ | 3.7975 | 1800 | 0.196 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1182
+ | 4.0084 | 1900 | 0.1943 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1183
+ | 4.2194 | 2000 | 0.194 | 0.2040 | 0.1977 | 0.1973 | 0.1962 | 0.1947 | 0.2075 | 0.7714 | -20.3955 | 0.8915 | -19.7279 | 0.9449 | -19.4724 | 0.8942 | -19.6232 | 0.8807 | -19.7699 | 0.7432 | -20.7547 | 0.4425 | 0.5618 | 0.5819 | 0.5021 | 0.5334 | 0.5250 |
1184
+ | 4.4304 | 2100 | 0.1951 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1185
+ | 4.6414 | 2200 | 0.1928 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1186
+ | 4.8523 | 2300 | 0.1909 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1187
+
1188
+
1189
+ ### Environmental Impact
1190
+ Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
1191
+ - **Energy Consumed**: 0.060 kWh
1192
+ - **Carbon Emitted**: 0.023 kg of CO2
1193
+ - **Hours Used**: 0.179 hours
1194
+
1195
+ ### Training Hardware
1196
+ - **On Cloud**: No
1197
+ - **GPU Model**: 1 x NVIDIA GeForce RTX 3090
1198
+ - **CPU Model**: 13th Gen Intel(R) Core(TM) i7-13700K
1199
+ - **RAM Size**: 31.78 GB
1200
+
1201
+ ### Framework Versions
1202
+ - Python: 3.11.6
1203
+ - Sentence Transformers: 3.0.0.dev0
1204
+ - Transformers: 4.41.0.dev0
1205
+ - PyTorch: 2.3.0+cu121
1206
+ - Accelerate: 0.26.1
1207
+ - Datasets: 2.18.0
1208
+ - Tokenizers: 0.19.1
1209
+
1210
+ ## Citation
1211
+
1212
+ ### BibTeX
1213
+
1214
+ #### Sentence Transformers
1215
+ ```bibtex
1216
+ @inproceedings{reimers-2019-sentence-bert,
1217
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
1218
+ author = "Reimers, Nils and Gurevych, Iryna",
1219
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
1220
+ month = "11",
1221
+ year = "2019",
1222
+ publisher = "Association for Computational Linguistics",
1223
+ url = "https://arxiv.org/abs/1908.10084",
1224
+ }
1225
+ ```
1226
+
1227
+ #### MSELoss
1228
+ ```bibtex
1229
+ @inproceedings{reimers-2020-multilingual-sentence-bert,
1230
+ title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
1231
+ author = "Reimers, Nils and Gurevych, Iryna",
1232
+ booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
1233
+ month = "11",
1234
+ year = "2020",
1235
+ publisher = "Association for Computational Linguistics",
1236
+ url = "https://arxiv.org/abs/2004.09813",
1237
+ }
1238
+ ```
1239
+
1240
+ <!--
1241
+ ## Glossary
1242
+
1243
+ *Clearly define terms in order to be accessible across audiences.*
1244
+ -->
1245
+
1246
+ <!--
1247
+ ## Model Card Authors
1248
+
1249
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
1250
+ -->
1251
+
1252
+ <!--
1253
+ ## Model Card Contact
1254
+
1255
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
1256
+ -->
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "xlm-roberta-base",
3
+ "architectures": [
4
+ "XLMRobertaModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "classifier_dropout": null,
9
+ "eos_token_id": 2,
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 768,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 3072,
15
+ "layer_norm_eps": 1e-05,
16
+ "max_position_embeddings": 514,
17
+ "model_type": "xlm-roberta",
18
+ "num_attention_heads": 12,
19
+ "num_hidden_layers": 12,
20
+ "output_past": true,
21
+ "pad_token_id": 1,
22
+ "position_embedding_type": "absolute",
23
+ "torch_dtype": "float32",
24
+ "transformers_version": "4.41.0.dev0",
25
+ "type_vocab_size": 1,
26
+ "use_cache": true,
27
+ "vocab_size": 250002
28
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.0.dev0",
4
+ "transformers": "4.41.0.dev0",
5
+ "pytorch": "2.3.0+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:149e16f1341357d04aa0ffc019a3dd067c6a43fd0e9c878c9b981c08c577cabd
3
+ size 1112197096
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 128,
3
+ "do_lower_case": false
4
+ }
sentencepiece.bpe.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cfc8146abe2a0488e9e2a0c56de7952f7c11ab059eca145a0a727afce0db2865
3
+ size 5069051
special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "cls_token": "<s>",
4
+ "eos_token": "</s>",
5
+ "mask_token": {
6
+ "content": "<mask>",
7
+ "lstrip": true,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "pad_token": "<pad>",
13
+ "sep_token": "</s>",
14
+ "unk_token": "<unk>"
15
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cad551d5600a84242d0973327029452a1e3672ba6313c2a3c3d69c4310e12719
3
+ size 17082987
tokenizer_config.json ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "mask_token": "<mask>",
49
+ "model_max_length": 512,
50
+ "pad_token": "<pad>",
51
+ "sep_token": "</s>",
52
+ "tokenizer_class": "XLMRobertaTokenizer",
53
+ "unk_token": "<unk>"
54
+ }