alaeddine-13 commited on
Commit
ae44071
1 Parent(s): 9c80cc1

README draft

Browse files
Files changed (1) hide show
  1. README.md +104 -2597
README.md CHANGED
@@ -2,2607 +2,114 @@
2
  pipeline_tag: sentence-similarity
3
  tags:
4
  - finetuner
 
5
  - sentence-transformers
6
  - feature-extraction
7
  - sentence-similarity
8
- - mteb
9
  datasets:
10
- - jinaai/negation-dataset
11
  language: en
12
  license: apache-2.0
13
  model-index:
14
- - name: jina-embedding-s-en-v2
15
- results:
16
- - task:
17
- type: Classification
18
- dataset:
19
- type: mteb/amazon_counterfactual
20
- name: MTEB AmazonCounterfactualClassification (en)
21
- config: en
22
- split: test
23
- revision: e8379541af4e31359cca9fbcf4b00f2671dba205
24
- metrics:
25
- - type: accuracy
26
- value: 69.70149253731343
27
- - type: ap
28
- value: 32.22528779918184
29
- - type: f1
30
- value: 63.66857824618267
31
- - task:
32
- type: Classification
33
- dataset:
34
- type: mteb/amazon_polarity
35
- name: MTEB AmazonPolarityClassification
36
- config: default
37
- split: test
38
- revision: e2d317d38cd51312af73b3d32a06d1a08b442046
39
- metrics:
40
- - type: accuracy
41
- value: 79.55879999999999
42
- - type: ap
43
- value: 73.97885664972738
44
- - type: f1
45
- value: 79.4849322624122
46
- - task:
47
- type: Classification
48
- dataset:
49
- type: mteb/amazon_reviews_multi
50
- name: MTEB AmazonReviewsClassification (en)
51
- config: en
52
- split: test
53
- revision: 1399c76144fd37290681b995c656ef9b2e06e26d
54
- metrics:
55
- - type: accuracy
56
- value: 38.69
57
- - type: f1
58
- value: 37.17512734389121
59
- - task:
60
- type: Retrieval
61
- dataset:
62
- type: arguana
63
- name: MTEB ArguAna
64
- config: default
65
- split: test
66
- revision: None
67
- metrics:
68
- - type: map_at_1
69
- value: 23.684
70
- - type: map_at_10
71
- value: 39.086999999999996
72
- - type: map_at_100
73
- value: 40.222
74
- - type: map_at_1000
75
- value: 40.231
76
- - type: map_at_3
77
- value: 34.282000000000004
78
- - type: map_at_5
79
- value: 36.689
80
- - type: mrr_at_1
81
- value: 23.826
82
- - type: mrr_at_10
83
- value: 39.147
84
- - type: mrr_at_100
85
- value: 40.282000000000004
86
- - type: mrr_at_1000
87
- value: 40.291
88
- - type: mrr_at_3
89
- value: 34.353
90
- - type: mrr_at_5
91
- value: 36.739
92
- - type: ndcg_at_1
93
- value: 23.684
94
- - type: ndcg_at_10
95
- value: 48.081
96
- - type: ndcg_at_100
97
- value: 52.902
98
- - type: ndcg_at_1000
99
- value: 53.111
100
- - type: ndcg_at_3
101
- value: 37.937
102
- - type: ndcg_at_5
103
- value: 42.32
104
- - type: precision_at_1
105
- value: 23.684
106
- - type: precision_at_10
107
- value: 7.703
108
- - type: precision_at_100
109
- value: 0.98
110
- - type: precision_at_1000
111
- value: 0.1
112
- - type: precision_at_3
113
- value: 16.192999999999998
114
- - type: precision_at_5
115
- value: 11.863
116
- - type: recall_at_1
117
- value: 23.684
118
- - type: recall_at_10
119
- value: 77.027
120
- - type: recall_at_100
121
- value: 98.009
122
- - type: recall_at_1000
123
- value: 99.57300000000001
124
- - type: recall_at_3
125
- value: 48.577999999999996
126
- - type: recall_at_5
127
- value: 59.317
128
- - task:
129
- type: Clustering
130
- dataset:
131
- type: mteb/arxiv-clustering-p2p
132
- name: MTEB ArxivClusteringP2P
133
- config: default
134
- split: test
135
- revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d
136
- metrics:
137
- - type: v_measure
138
- value: 44.249612940073035
139
- - task:
140
- type: Clustering
141
- dataset:
142
- type: mteb/arxiv-clustering-s2s
143
- name: MTEB ArxivClusteringS2S
144
- config: default
145
- split: test
146
- revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53
147
- metrics:
148
- - type: v_measure
149
- value: 35.39423011105325
150
- - task:
151
- type: Reranking
152
- dataset:
153
- type: mteb/askubuntudupquestions-reranking
154
- name: MTEB AskUbuntuDupQuestions
155
- config: default
156
- split: test
157
- revision: 2000358ca161889fa9c082cb41daa8dcfb161a54
158
- metrics:
159
- - type: map
160
- value: 59.89078304869791
161
- - type: mrr
162
- value: 73.5045948203843
163
- - task:
164
- type: STS
165
- dataset:
166
- type: mteb/biosses-sts
167
- name: MTEB BIOSSES
168
- config: default
169
- split: test
170
- revision: d3fb88f8f02e40887cd149695127462bbcf29b4a
171
- metrics:
172
- - type: cos_sim_pearson
173
- value: 82.49373811125967
174
- - type: cos_sim_spearman
175
- value: 81.0446177409314
176
- - type: euclidean_pearson
177
- value: 82.1327844624042
178
- - type: euclidean_spearman
179
- value: 81.0446177409314
180
- - type: manhattan_pearson
181
- value: 81.88575541723692
182
- - type: manhattan_spearman
183
- value: 81.0705219456341
184
- - task:
185
- type: Classification
186
- dataset:
187
- type: mteb/banking77
188
- name: MTEB Banking77Classification
189
- config: default
190
- split: test
191
- revision: 0fd18e25b25c072e09e0d92ab615fda904d66300
192
- metrics:
193
- - type: accuracy
194
- value: 78.27272727272728
195
- - type: f1
196
- value: 77.36583416688741
197
- - task:
198
- type: Clustering
199
- dataset:
200
- type: mteb/biorxiv-clustering-p2p
201
- name: MTEB BiorxivClusteringP2P
202
- config: default
203
- split: test
204
- revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40
205
- metrics:
206
- - type: v_measure
207
- value: 36.12447585258704
208
- - task:
209
- type: Clustering
210
- dataset:
211
- type: mteb/biorxiv-clustering-s2s
212
- name: MTEB BiorxivClusteringS2S
213
- config: default
214
- split: test
215
- revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908
216
- metrics:
217
- - type: v_measure
218
- value: 29.305990951348743
219
- - task:
220
- type: Retrieval
221
- dataset:
222
- type: BeIR/cqadupstack
223
- name: MTEB CQADupstackAndroidRetrieval
224
- config: default
225
- split: test
226
- revision: None
227
- metrics:
228
- - type: map_at_1
229
- value: 31.458000000000002
230
- - type: map_at_10
231
- value: 42.132
232
- - type: map_at_100
233
- value: 43.47
234
- - type: map_at_1000
235
- value: 43.612
236
- - type: map_at_3
237
- value: 38.718
238
- - type: map_at_5
239
- value: 40.556
240
- - type: mrr_at_1
241
- value: 38.627
242
- - type: mrr_at_10
243
- value: 47.998000000000005
244
- - type: mrr_at_100
245
- value: 48.726
246
- - type: mrr_at_1000
247
- value: 48.778
248
- - type: mrr_at_3
249
- value: 45.255
250
- - type: mrr_at_5
251
- value: 46.893
252
- - type: ndcg_at_1
253
- value: 38.627
254
- - type: ndcg_at_10
255
- value: 48.229
256
- - type: ndcg_at_100
257
- value: 53.108999999999995
258
- - type: ndcg_at_1000
259
- value: 55.385
260
- - type: ndcg_at_3
261
- value: 43.191
262
- - type: ndcg_at_5
263
- value: 45.385999999999996
264
- - type: precision_at_1
265
- value: 38.627
266
- - type: precision_at_10
267
- value: 9.142
268
- - type: precision_at_100
269
- value: 1.462
270
- - type: precision_at_1000
271
- value: 0.19499999999999998
272
- - type: precision_at_3
273
- value: 20.552999999999997
274
- - type: precision_at_5
275
- value: 14.677999999999999
276
- - type: recall_at_1
277
- value: 31.458000000000002
278
- - type: recall_at_10
279
- value: 59.619
280
- - type: recall_at_100
281
- value: 79.953
282
- - type: recall_at_1000
283
- value: 94.921
284
- - type: recall_at_3
285
- value: 44.744
286
- - type: recall_at_5
287
- value: 51.010999999999996
288
- - task:
289
- type: Retrieval
290
- dataset:
291
- type: BeIR/cqadupstack
292
- name: MTEB CQADupstackEnglishRetrieval
293
- config: default
294
- split: test
295
- revision: None
296
- metrics:
297
- - type: map_at_1
298
- value: 26.762000000000004
299
- - type: map_at_10
300
- value: 35.366
301
- - type: map_at_100
302
- value: 36.481
303
- - type: map_at_1000
304
- value: 36.614999999999995
305
- - type: map_at_3
306
- value: 33.071
307
- - type: map_at_5
308
- value: 34.495
309
- - type: mrr_at_1
310
- value: 33.312000000000005
311
- - type: mrr_at_10
312
- value: 40.841
313
- - type: mrr_at_100
314
- value: 41.54
315
- - type: mrr_at_1000
316
- value: 41.592
317
- - type: mrr_at_3
318
- value: 38.928000000000004
319
- - type: mrr_at_5
320
- value: 40.119
321
- - type: ndcg_at_1
322
- value: 33.312000000000005
323
- - type: ndcg_at_10
324
- value: 40.238
325
- - type: ndcg_at_100
326
- value: 44.647
327
- - type: ndcg_at_1000
328
- value: 47.010999999999996
329
- - type: ndcg_at_3
330
- value: 36.991
331
- - type: ndcg_at_5
332
- value: 38.721
333
- - type: precision_at_1
334
- value: 33.312000000000005
335
- - type: precision_at_10
336
- value: 7.4079999999999995
337
- - type: precision_at_100
338
- value: 1.253
339
- - type: precision_at_1000
340
- value: 0.17500000000000002
341
- - type: precision_at_3
342
- value: 17.898
343
- - type: precision_at_5
344
- value: 12.687999999999999
345
- - type: recall_at_1
346
- value: 26.762000000000004
347
- - type: recall_at_10
348
- value: 48.41
349
- - type: recall_at_100
350
- value: 67.523
351
- - type: recall_at_1000
352
- value: 82.91199999999999
353
- - type: recall_at_3
354
- value: 38.6
355
- - type: recall_at_5
356
- value: 43.477
357
- - task:
358
- type: Retrieval
359
- dataset:
360
- type: BeIR/cqadupstack
361
- name: MTEB CQADupstackGamingRetrieval
362
- config: default
363
- split: test
364
- revision: None
365
- metrics:
366
- - type: map_at_1
367
- value: 37.578
368
- - type: map_at_10
369
- value: 49.415
370
- - type: map_at_100
371
- value: 50.339
372
- - type: map_at_1000
373
- value: 50.402
374
- - type: map_at_3
375
- value: 46.412
376
- - type: map_at_5
377
- value: 48.183
378
- - type: mrr_at_1
379
- value: 43.072
380
- - type: mrr_at_10
381
- value: 52.82599999999999
382
- - type: mrr_at_100
383
- value: 53.456
384
- - type: mrr_at_1000
385
- value: 53.493
386
- - type: mrr_at_3
387
- value: 50.407999999999994
388
- - type: mrr_at_5
389
- value: 51.922000000000004
390
- - type: ndcg_at_1
391
- value: 43.072
392
- - type: ndcg_at_10
393
- value: 54.949000000000005
394
- - type: ndcg_at_100
395
- value: 58.744
396
- - type: ndcg_at_1000
397
- value: 60.150000000000006
398
- - type: ndcg_at_3
399
- value: 49.864000000000004
400
- - type: ndcg_at_5
401
- value: 52.503
402
- - type: precision_at_1
403
- value: 43.072
404
- - type: precision_at_10
405
- value: 8.734
406
- - type: precision_at_100
407
- value: 1.1520000000000001
408
- - type: precision_at_1000
409
- value: 0.132
410
- - type: precision_at_3
411
- value: 22.131999999999998
412
- - type: precision_at_5
413
- value: 15.21
414
- - type: recall_at_1
415
- value: 37.578
416
- - type: recall_at_10
417
- value: 67.918
418
- - type: recall_at_100
419
- value: 84.373
420
- - type: recall_at_1000
421
- value: 94.529
422
- - type: recall_at_3
423
- value: 54.457
424
- - type: recall_at_5
425
- value: 60.941
426
- - task:
427
- type: Retrieval
428
- dataset:
429
- type: BeIR/cqadupstack
430
- name: MTEB CQADupstackGisRetrieval
431
- config: default
432
- split: test
433
- revision: None
434
- metrics:
435
- - type: map_at_1
436
- value: 23.394000000000002
437
- - type: map_at_10
438
- value: 31.791000000000004
439
- - type: map_at_100
440
- value: 32.64
441
- - type: map_at_1000
442
- value: 32.727000000000004
443
- - type: map_at_3
444
- value: 29.557
445
- - type: map_at_5
446
- value: 30.858999999999998
447
- - type: mrr_at_1
448
- value: 25.085
449
- - type: mrr_at_10
450
- value: 33.721000000000004
451
- - type: mrr_at_100
452
- value: 34.492
453
- - type: mrr_at_1000
454
- value: 34.564
455
- - type: mrr_at_3
456
- value: 31.619999999999997
457
- - type: mrr_at_5
458
- value: 32.896
459
- - type: ndcg_at_1
460
- value: 25.085
461
- - type: ndcg_at_10
462
- value: 36.370000000000005
463
- - type: ndcg_at_100
464
- value: 40.96
465
- - type: ndcg_at_1000
466
- value: 43.171
467
- - type: ndcg_at_3
468
- value: 32.104
469
- - type: ndcg_at_5
470
- value: 34.300000000000004
471
- - type: precision_at_1
472
- value: 25.085
473
- - type: precision_at_10
474
- value: 5.537
475
- - type: precision_at_100
476
- value: 0.8340000000000001
477
- - type: precision_at_1000
478
- value: 0.105
479
- - type: precision_at_3
480
- value: 13.71
481
- - type: precision_at_5
482
- value: 9.514
483
- - type: recall_at_1
484
- value: 23.394000000000002
485
- - type: recall_at_10
486
- value: 48.549
487
- - type: recall_at_100
488
- value: 70.341
489
- - type: recall_at_1000
490
- value: 87.01299999999999
491
- - type: recall_at_3
492
- value: 36.947
493
- - type: recall_at_5
494
- value: 42.365
495
- - task:
496
- type: Retrieval
497
- dataset:
498
- type: BeIR/cqadupstack
499
- name: MTEB CQADupstackMathematicaRetrieval
500
- config: default
501
- split: test
502
- revision: None
503
- metrics:
504
- - type: map_at_1
505
- value: 14.818000000000001
506
- - type: map_at_10
507
- value: 21.773999999999997
508
- - type: map_at_100
509
- value: 22.787
510
- - type: map_at_1000
511
- value: 22.915
512
- - type: map_at_3
513
- value: 19.414
514
- - type: map_at_5
515
- value: 20.651
516
- - type: mrr_at_1
517
- value: 18.657
518
- - type: mrr_at_10
519
- value: 25.794
520
- - type: mrr_at_100
521
- value: 26.695999999999998
522
- - type: mrr_at_1000
523
- value: 26.776
524
- - type: mrr_at_3
525
- value: 23.279
526
- - type: mrr_at_5
527
- value: 24.598
528
- - type: ndcg_at_1
529
- value: 18.657
530
- - type: ndcg_at_10
531
- value: 26.511000000000003
532
- - type: ndcg_at_100
533
- value: 31.447999999999997
534
- - type: ndcg_at_1000
535
- value: 34.71
536
- - type: ndcg_at_3
537
- value: 21.92
538
- - type: ndcg_at_5
539
- value: 23.938000000000002
540
- - type: precision_at_1
541
- value: 18.657
542
- - type: precision_at_10
543
- value: 4.9
544
- - type: precision_at_100
545
- value: 0.851
546
- - type: precision_at_1000
547
- value: 0.127
548
- - type: precision_at_3
549
- value: 10.488999999999999
550
- - type: precision_at_5
551
- value: 7.710999999999999
552
- - type: recall_at_1
553
- value: 14.818000000000001
554
- - type: recall_at_10
555
- value: 37.408
556
- - type: recall_at_100
557
- value: 58.81999999999999
558
- - type: recall_at_1000
559
- value: 82.612
560
- - type: recall_at_3
561
- value: 24.561
562
- - type: recall_at_5
563
- value: 29.685
564
- - task:
565
- type: Retrieval
566
- dataset:
567
- type: BeIR/cqadupstack
568
- name: MTEB CQADupstackPhysicsRetrieval
569
- config: default
570
- split: test
571
- revision: None
572
- metrics:
573
- - type: map_at_1
574
- value: 26.332
575
- - type: map_at_10
576
- value: 35.366
577
- - type: map_at_100
578
- value: 36.569
579
- - type: map_at_1000
580
- value: 36.689
581
- - type: map_at_3
582
- value: 32.582
583
- - type: map_at_5
584
- value: 34.184
585
- - type: mrr_at_1
586
- value: 32.05
587
- - type: mrr_at_10
588
- value: 40.902
589
- - type: mrr_at_100
590
- value: 41.754000000000005
591
- - type: mrr_at_1000
592
- value: 41.811
593
- - type: mrr_at_3
594
- value: 38.547
595
- - type: mrr_at_5
596
- value: 40.019
597
- - type: ndcg_at_1
598
- value: 32.05
599
- - type: ndcg_at_10
600
- value: 40.999
601
- - type: ndcg_at_100
602
- value: 46.284
603
- - type: ndcg_at_1000
604
- value: 48.698
605
- - type: ndcg_at_3
606
- value: 36.39
607
- - type: ndcg_at_5
608
- value: 38.699
609
- - type: precision_at_1
610
- value: 32.05
611
- - type: precision_at_10
612
- value: 7.315
613
- - type: precision_at_100
614
- value: 1.172
615
- - type: precision_at_1000
616
- value: 0.156
617
- - type: precision_at_3
618
- value: 17.036
619
- - type: precision_at_5
620
- value: 12.089
621
- - type: recall_at_1
622
- value: 26.332
623
- - type: recall_at_10
624
- value: 52.410000000000004
625
- - type: recall_at_100
626
- value: 74.763
627
- - type: recall_at_1000
628
- value: 91.03
629
- - type: recall_at_3
630
- value: 39.527
631
- - type: recall_at_5
632
- value: 45.517
633
- - task:
634
- type: Retrieval
635
- dataset:
636
- type: BeIR/cqadupstack
637
- name: MTEB CQADupstackProgrammersRetrieval
638
- config: default
639
- split: test
640
- revision: None
641
- metrics:
642
- - type: map_at_1
643
- value: 22.849
644
- - type: map_at_10
645
- value: 31.502000000000002
646
- - type: map_at_100
647
- value: 32.854
648
- - type: map_at_1000
649
- value: 32.975
650
- - type: map_at_3
651
- value: 28.997
652
- - type: map_at_5
653
- value: 30.508999999999997
654
- - type: mrr_at_1
655
- value: 28.195999999999998
656
- - type: mrr_at_10
657
- value: 36.719
658
- - type: mrr_at_100
659
- value: 37.674
660
- - type: mrr_at_1000
661
- value: 37.743
662
- - type: mrr_at_3
663
- value: 34.532000000000004
664
- - type: mrr_at_5
665
- value: 35.845
666
- - type: ndcg_at_1
667
- value: 28.195999999999998
668
- - type: ndcg_at_10
669
- value: 36.605
670
- - type: ndcg_at_100
671
- value: 42.524
672
- - type: ndcg_at_1000
673
- value: 45.171
674
- - type: ndcg_at_3
675
- value: 32.574
676
- - type: ndcg_at_5
677
- value: 34.617
678
- - type: precision_at_1
679
- value: 28.195999999999998
680
- - type: precision_at_10
681
- value: 6.598
682
- - type: precision_at_100
683
- value: 1.121
684
- - type: precision_at_1000
685
- value: 0.153
686
- - type: precision_at_3
687
- value: 15.601
688
- - type: precision_at_5
689
- value: 11.073
690
- - type: recall_at_1
691
- value: 22.849
692
- - type: recall_at_10
693
- value: 46.528000000000006
694
- - type: recall_at_100
695
- value: 72.09
696
- - type: recall_at_1000
697
- value: 90.398
698
- - type: recall_at_3
699
- value: 35.116
700
- - type: recall_at_5
701
- value: 40.778
702
- - task:
703
- type: Retrieval
704
- dataset:
705
- type: BeIR/cqadupstack
706
- name: MTEB CQADupstackRetrieval
707
- config: default
708
- split: test
709
- revision: None
710
- metrics:
711
- - type: map_at_1
712
- value: 24.319500000000005
713
- - type: map_at_10
714
- value: 32.530166666666666
715
- - type: map_at_100
716
- value: 33.61566666666667
717
- - type: map_at_1000
718
- value: 33.73808333333333
719
- - type: map_at_3
720
- value: 30.074583333333326
721
- - type: map_at_5
722
- value: 31.429666666666662
723
- - type: mrr_at_1
724
- value: 28.675916666666666
725
- - type: mrr_at_10
726
- value: 36.49308333333334
727
- - type: mrr_at_100
728
- value: 37.310583333333334
729
- - type: mrr_at_1000
730
- value: 37.37616666666666
731
- - type: mrr_at_3
732
- value: 34.283166666666666
733
- - type: mrr_at_5
734
- value: 35.54333333333334
735
- - type: ndcg_at_1
736
- value: 28.675916666666666
737
- - type: ndcg_at_10
738
- value: 37.403416666666665
739
- - type: ndcg_at_100
740
- value: 42.25783333333333
741
- - type: ndcg_at_1000
742
- value: 44.778333333333336
743
- - type: ndcg_at_3
744
- value: 33.17099999999999
745
- - type: ndcg_at_5
746
- value: 35.12666666666667
747
- - type: precision_at_1
748
- value: 28.675916666666666
749
- - type: precision_at_10
750
- value: 6.463083333333334
751
- - type: precision_at_100
752
- value: 1.0585
753
- - type: precision_at_1000
754
- value: 0.14633333333333332
755
- - type: precision_at_3
756
- value: 15.158999999999997
757
- - type: precision_at_5
758
- value: 10.673916666666667
759
- - type: recall_at_1
760
- value: 24.319500000000005
761
- - type: recall_at_10
762
- value: 47.9135
763
- - type: recall_at_100
764
- value: 69.40266666666666
765
- - type: recall_at_1000
766
- value: 87.12566666666666
767
- - type: recall_at_3
768
- value: 36.03149999999999
769
- - type: recall_at_5
770
- value: 41.12791666666668
771
- - task:
772
- type: Retrieval
773
- dataset:
774
- type: BeIR/cqadupstack
775
- name: MTEB CQADupstackStatsRetrieval
776
- config: default
777
- split: test
778
- revision: None
779
- metrics:
780
- - type: map_at_1
781
- value: 22.997
782
- - type: map_at_10
783
- value: 28.754999999999995
784
- - type: map_at_100
785
- value: 29.555999999999997
786
- - type: map_at_1000
787
- value: 29.653000000000002
788
- - type: map_at_3
789
- value: 27.069
790
- - type: map_at_5
791
- value: 27.884999999999998
792
- - type: mrr_at_1
793
- value: 25.767
794
- - type: mrr_at_10
795
- value: 31.195
796
- - type: mrr_at_100
797
- value: 31.964
798
- - type: mrr_at_1000
799
- value: 32.039
800
- - type: mrr_at_3
801
- value: 29.601
802
- - type: mrr_at_5
803
- value: 30.345
804
- - type: ndcg_at_1
805
- value: 25.767
806
- - type: ndcg_at_10
807
- value: 32.234
808
- - type: ndcg_at_100
809
- value: 36.461
810
- - type: ndcg_at_1000
811
- value: 39.005
812
- - type: ndcg_at_3
813
- value: 29.052
814
- - type: ndcg_at_5
815
- value: 30.248
816
- - type: precision_at_1
817
- value: 25.767
818
- - type: precision_at_10
819
- value: 4.893
820
- - type: precision_at_100
821
- value: 0.761
822
- - type: precision_at_1000
823
- value: 0.105
824
- - type: precision_at_3
825
- value: 12.219
826
- - type: precision_at_5
827
- value: 8.19
828
- - type: recall_at_1
829
- value: 22.997
830
- - type: recall_at_10
831
- value: 40.652
832
- - type: recall_at_100
833
- value: 60.302
834
- - type: recall_at_1000
835
- value: 79.17999999999999
836
- - type: recall_at_3
837
- value: 31.680999999999997
838
- - type: recall_at_5
839
- value: 34.698
840
- - task:
841
- type: Retrieval
842
- dataset:
843
- type: BeIR/cqadupstack
844
- name: MTEB CQADupstackTexRetrieval
845
- config: default
846
- split: test
847
- revision: None
848
- metrics:
849
- - type: map_at_1
850
- value: 16.3
851
- - type: map_at_10
852
- value: 22.581
853
- - type: map_at_100
854
- value: 23.517
855
- - type: map_at_1000
856
- value: 23.638
857
- - type: map_at_3
858
- value: 20.567
859
- - type: map_at_5
860
- value: 21.688
861
- - type: mrr_at_1
862
- value: 19.683
863
- - type: mrr_at_10
864
- value: 26.185000000000002
865
- - type: mrr_at_100
866
- value: 27.014
867
- - type: mrr_at_1000
868
- value: 27.092
869
- - type: mrr_at_3
870
- value: 24.145
871
- - type: mrr_at_5
872
- value: 25.308999999999997
873
- - type: ndcg_at_1
874
- value: 19.683
875
- - type: ndcg_at_10
876
- value: 26.699
877
- - type: ndcg_at_100
878
- value: 31.35
879
- - type: ndcg_at_1000
880
- value: 34.348
881
- - type: ndcg_at_3
882
- value: 23.026
883
- - type: ndcg_at_5
884
- value: 24.731
885
- - type: precision_at_1
886
- value: 19.683
887
- - type: precision_at_10
888
- value: 4.814
889
- - type: precision_at_100
890
- value: 0.836
891
- - type: precision_at_1000
892
- value: 0.126
893
- - type: precision_at_3
894
- value: 10.782
895
- - type: precision_at_5
896
- value: 7.825
897
- - type: recall_at_1
898
- value: 16.3
899
- - type: recall_at_10
900
- value: 35.521
901
- - type: recall_at_100
902
- value: 56.665
903
- - type: recall_at_1000
904
- value: 78.361
905
- - type: recall_at_3
906
- value: 25.223000000000003
907
- - type: recall_at_5
908
- value: 29.626
909
- - task:
910
- type: Retrieval
911
- dataset:
912
- type: BeIR/cqadupstack
913
- name: MTEB CQADupstackUnixRetrieval
914
- config: default
915
- split: test
916
- revision: None
917
- metrics:
918
- - type: map_at_1
919
- value: 24.596999999999998
920
- - type: map_at_10
921
- value: 32.54
922
- - type: map_at_100
923
- value: 33.548
924
- - type: map_at_1000
925
- value: 33.661
926
- - type: map_at_3
927
- value: 30.134
928
- - type: map_at_5
929
- value: 31.468
930
- - type: mrr_at_1
931
- value: 28.825
932
- - type: mrr_at_10
933
- value: 36.495
934
- - type: mrr_at_100
935
- value: 37.329
936
- - type: mrr_at_1000
937
- value: 37.397999999999996
938
- - type: mrr_at_3
939
- value: 34.359
940
- - type: mrr_at_5
941
- value: 35.53
942
- - type: ndcg_at_1
943
- value: 28.825
944
- - type: ndcg_at_10
945
- value: 37.341
946
- - type: ndcg_at_100
947
- value: 42.221
948
- - type: ndcg_at_1000
949
- value: 44.799
950
- - type: ndcg_at_3
951
- value: 33.058
952
- - type: ndcg_at_5
953
- value: 34.961999999999996
954
- - type: precision_at_1
955
- value: 28.825
956
- - type: precision_at_10
957
- value: 6.175
958
- - type: precision_at_100
959
- value: 0.97
960
- - type: precision_at_1000
961
- value: 0.13
962
- - type: precision_at_3
963
- value: 14.924999999999999
964
- - type: precision_at_5
965
- value: 10.392
966
- - type: recall_at_1
967
- value: 24.596999999999998
968
- - type: recall_at_10
969
- value: 48.067
970
- - type: recall_at_100
971
- value: 69.736
972
- - type: recall_at_1000
973
- value: 87.855
974
- - type: recall_at_3
975
- value: 36.248999999999995
976
- - type: recall_at_5
977
- value: 41.086
978
- - task:
979
- type: Retrieval
980
- dataset:
981
- type: BeIR/cqadupstack
982
- name: MTEB CQADupstackWebmastersRetrieval
983
- config: default
984
- split: test
985
- revision: None
986
- metrics:
987
- - type: map_at_1
988
- value: 24.224999999999998
989
- - type: map_at_10
990
- value: 31.826
991
- - type: map_at_100
992
- value: 33.366
993
- - type: map_at_1000
994
- value: 33.6
995
- - type: map_at_3
996
- value: 29.353
997
- - type: map_at_5
998
- value: 30.736
999
- - type: mrr_at_1
1000
- value: 28.656
1001
- - type: mrr_at_10
1002
- value: 36.092
1003
- - type: mrr_at_100
1004
- value: 37.076
1005
- - type: mrr_at_1000
1006
- value: 37.141999999999996
1007
- - type: mrr_at_3
1008
- value: 33.86
1009
- - type: mrr_at_5
1010
- value: 35.144999999999996
1011
- - type: ndcg_at_1
1012
- value: 28.656
1013
- - type: ndcg_at_10
1014
- value: 37.025999999999996
1015
- - type: ndcg_at_100
1016
- value: 42.844
1017
- - type: ndcg_at_1000
1018
- value: 45.716
1019
- - type: ndcg_at_3
1020
- value: 32.98
1021
- - type: ndcg_at_5
1022
- value: 34.922
1023
- - type: precision_at_1
1024
- value: 28.656
1025
- - type: precision_at_10
1026
- value: 6.976
1027
- - type: precision_at_100
1028
- value: 1.48
1029
- - type: precision_at_1000
1030
- value: 0.23700000000000002
1031
- - type: precision_at_3
1032
- value: 15.348999999999998
1033
- - type: precision_at_5
1034
- value: 11.028
1035
- - type: recall_at_1
1036
- value: 24.224999999999998
1037
- - type: recall_at_10
1038
- value: 46.589999999999996
1039
- - type: recall_at_100
1040
- value: 72.331
1041
- - type: recall_at_1000
1042
- value: 90.891
1043
- - type: recall_at_3
1044
- value: 34.996
1045
- - type: recall_at_5
1046
- value: 40.294000000000004
1047
- - task:
1048
- type: Retrieval
1049
- dataset:
1050
- type: BeIR/cqadupstack
1051
- name: MTEB CQADupstackWordpressRetrieval
1052
- config: default
1053
- split: test
1054
- revision: None
1055
- metrics:
1056
- - type: map_at_1
1057
- value: 20.524
1058
- - type: map_at_10
1059
- value: 27.314
1060
- - type: map_at_100
1061
- value: 28.260999999999996
1062
- - type: map_at_1000
1063
- value: 28.37
1064
- - type: map_at_3
1065
- value: 25.020999999999997
1066
- - type: map_at_5
1067
- value: 25.942
1068
- - type: mrr_at_1
1069
- value: 22.181
1070
- - type: mrr_at_10
1071
- value: 29.149
1072
- - type: mrr_at_100
1073
- value: 30.006
1074
- - type: mrr_at_1000
1075
- value: 30.086000000000002
1076
- - type: mrr_at_3
1077
- value: 26.863999999999997
1078
- - type: mrr_at_5
1079
- value: 27.899
1080
- - type: ndcg_at_1
1081
- value: 22.181
1082
- - type: ndcg_at_10
1083
- value: 31.64
1084
- - type: ndcg_at_100
1085
- value: 36.502
1086
- - type: ndcg_at_1000
1087
- value: 39.176
1088
- - type: ndcg_at_3
1089
- value: 26.901999999999997
1090
- - type: ndcg_at_5
1091
- value: 28.493000000000002
1092
- - type: precision_at_1
1093
- value: 22.181
1094
- - type: precision_at_10
1095
- value: 5.065
1096
- - type: precision_at_100
1097
- value: 0.8099999999999999
1098
- - type: precision_at_1000
1099
- value: 0.11499999999999999
1100
- - type: precision_at_3
1101
- value: 11.214
1102
- - type: precision_at_5
1103
- value: 7.689
1104
- - type: recall_at_1
1105
- value: 20.524
1106
- - type: recall_at_10
1107
- value: 43.29
1108
- - type: recall_at_100
1109
- value: 65.935
1110
- - type: recall_at_1000
1111
- value: 85.80600000000001
1112
- - type: recall_at_3
1113
- value: 30.276999999999997
1114
- - type: recall_at_5
1115
- value: 34.056999999999995
1116
- - task:
1117
- type: Retrieval
1118
- dataset:
1119
- type: climate-fever
1120
- name: MTEB ClimateFEVER
1121
- config: default
1122
- split: test
1123
- revision: None
1124
- metrics:
1125
- - type: map_at_1
1126
- value: 10.488999999999999
1127
- - type: map_at_10
1128
- value: 17.98
1129
- - type: map_at_100
1130
- value: 19.581
1131
- - type: map_at_1000
1132
- value: 19.739
1133
- - type: map_at_3
1134
- value: 15.054
1135
- - type: map_at_5
1136
- value: 16.439999999999998
1137
- - type: mrr_at_1
1138
- value: 23.192
1139
- - type: mrr_at_10
1140
- value: 33.831
1141
- - type: mrr_at_100
1142
- value: 34.833
1143
- - type: mrr_at_1000
1144
- value: 34.881
1145
- - type: mrr_at_3
1146
- value: 30.793
1147
- - type: mrr_at_5
1148
- value: 32.535
1149
- - type: ndcg_at_1
1150
- value: 23.192
1151
- - type: ndcg_at_10
1152
- value: 25.446
1153
- - type: ndcg_at_100
1154
- value: 31.948
1155
- - type: ndcg_at_1000
1156
- value: 35.028
1157
- - type: ndcg_at_3
1158
- value: 20.744
1159
- - type: ndcg_at_5
1160
- value: 22.233
1161
- - type: precision_at_1
1162
- value: 23.192
1163
- - type: precision_at_10
1164
- value: 8.026
1165
- - type: precision_at_100
1166
- value: 1.482
1167
- - type: precision_at_1000
1168
- value: 0.20500000000000002
1169
- - type: precision_at_3
1170
- value: 15.548
1171
- - type: precision_at_5
1172
- value: 11.87
1173
- - type: recall_at_1
1174
- value: 10.488999999999999
1175
- - type: recall_at_10
1176
- value: 30.865
1177
- - type: recall_at_100
1178
- value: 53.428
1179
- - type: recall_at_1000
1180
- value: 70.89
1181
- - type: recall_at_3
1182
- value: 19.245
1183
- - type: recall_at_5
1184
- value: 23.657
1185
- - task:
1186
- type: Retrieval
1187
- dataset:
1188
- type: dbpedia-entity
1189
- name: MTEB DBPedia
1190
- config: default
1191
- split: test
1192
- revision: None
1193
- metrics:
1194
- - type: map_at_1
1195
- value: 7.123
1196
- - type: map_at_10
1197
- value: 14.448
1198
- - type: map_at_100
1199
- value: 19.798
1200
- - type: map_at_1000
1201
- value: 21.082
1202
- - type: map_at_3
1203
- value: 10.815
1204
- - type: map_at_5
1205
- value: 12.422
1206
- - type: mrr_at_1
1207
- value: 53.5
1208
- - type: mrr_at_10
1209
- value: 63.117999999999995
1210
- - type: mrr_at_100
1211
- value: 63.617999999999995
1212
- - type: mrr_at_1000
1213
- value: 63.63799999999999
1214
- - type: mrr_at_3
1215
- value: 60.708
1216
- - type: mrr_at_5
1217
- value: 62.171
1218
- - type: ndcg_at_1
1219
- value: 42.125
1220
- - type: ndcg_at_10
1221
- value: 31.703
1222
- - type: ndcg_at_100
1223
- value: 35.935
1224
- - type: ndcg_at_1000
1225
- value: 43.173
1226
- - type: ndcg_at_3
1227
- value: 35.498000000000005
1228
- - type: ndcg_at_5
1229
- value: 33.645
1230
- - type: precision_at_1
1231
- value: 53.5
1232
- - type: precision_at_10
1233
- value: 25.025
1234
- - type: precision_at_100
1235
- value: 8.19
1236
- - type: precision_at_1000
1237
- value: 1.806
1238
- - type: precision_at_3
1239
- value: 39.083
1240
- - type: precision_at_5
1241
- value: 33.050000000000004
1242
- - type: recall_at_1
1243
- value: 7.123
1244
- - type: recall_at_10
1245
- value: 19.581
1246
- - type: recall_at_100
1247
- value: 42.061
1248
- - type: recall_at_1000
1249
- value: 65.879
1250
- - type: recall_at_3
1251
- value: 12.026
1252
- - type: recall_at_5
1253
- value: 14.846
1254
- - task:
1255
- type: Classification
1256
- dataset:
1257
- type: mteb/emotion
1258
- name: MTEB EmotionClassification
1259
- config: default
1260
- split: test
1261
- revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37
1262
- metrics:
1263
- - type: accuracy
1264
- value: 41.24
1265
- - type: f1
1266
- value: 36.76174115773002
1267
- - task:
1268
- type: Retrieval
1269
- dataset:
1270
- type: fever
1271
- name: MTEB FEVER
1272
- config: default
1273
- split: test
1274
- revision: None
1275
- metrics:
1276
- - type: map_at_1
1277
- value: 47.821999999999996
1278
- - type: map_at_10
1279
- value: 59.794000000000004
1280
- - type: map_at_100
1281
- value: 60.316
1282
- - type: map_at_1000
1283
- value: 60.34
1284
- - type: map_at_3
1285
- value: 57.202
1286
- - type: map_at_5
1287
- value: 58.823
1288
- - type: mrr_at_1
1289
- value: 51.485
1290
- - type: mrr_at_10
1291
- value: 63.709
1292
- - type: mrr_at_100
1293
- value: 64.144
1294
- - type: mrr_at_1000
1295
- value: 64.158
1296
- - type: mrr_at_3
1297
- value: 61.251
1298
- - type: mrr_at_5
1299
- value: 62.818
1300
- - type: ndcg_at_1
1301
- value: 51.485
1302
- - type: ndcg_at_10
1303
- value: 66.097
1304
- - type: ndcg_at_100
1305
- value: 68.37
1306
- - type: ndcg_at_1000
1307
- value: 68.916
1308
- - type: ndcg_at_3
1309
- value: 61.12800000000001
1310
- - type: ndcg_at_5
1311
- value: 63.885000000000005
1312
- - type: precision_at_1
1313
- value: 51.485
1314
- - type: precision_at_10
1315
- value: 8.956999999999999
1316
- - type: precision_at_100
1317
- value: 1.02
1318
- - type: precision_at_1000
1319
- value: 0.108
1320
- - type: precision_at_3
1321
- value: 24.807000000000002
1322
- - type: precision_at_5
1323
- value: 16.387999999999998
1324
- - type: recall_at_1
1325
- value: 47.821999999999996
1326
- - type: recall_at_10
1327
- value: 81.773
1328
- - type: recall_at_100
1329
- value: 91.731
1330
- - type: recall_at_1000
1331
- value: 95.649
1332
- - type: recall_at_3
1333
- value: 68.349
1334
- - type: recall_at_5
1335
- value: 75.093
1336
- - task:
1337
- type: Retrieval
1338
- dataset:
1339
- type: fiqa
1340
- name: MTEB FiQA2018
1341
- config: default
1342
- split: test
1343
- revision: None
1344
- metrics:
1345
- - type: map_at_1
1346
- value: 15.662999999999998
1347
- - type: map_at_10
1348
- value: 25.726
1349
- - type: map_at_100
1350
- value: 27.581
1351
- - type: map_at_1000
1352
- value: 27.772000000000002
1353
- - type: map_at_3
1354
- value: 21.859
1355
- - type: map_at_5
1356
- value: 24.058
1357
- - type: mrr_at_1
1358
- value: 30.247
1359
- - type: mrr_at_10
1360
- value: 39.581
1361
- - type: mrr_at_100
1362
- value: 40.594
1363
- - type: mrr_at_1000
1364
- value: 40.647
1365
- - type: mrr_at_3
1366
- value: 37.166
1367
- - type: mrr_at_5
1368
- value: 38.585
1369
- - type: ndcg_at_1
1370
- value: 30.247
1371
- - type: ndcg_at_10
1372
- value: 32.934999999999995
1373
- - type: ndcg_at_100
1374
- value: 40.062999999999995
1375
- - type: ndcg_at_1000
1376
- value: 43.492
1377
- - type: ndcg_at_3
1378
- value: 28.871000000000002
1379
- - type: ndcg_at_5
1380
- value: 30.492
1381
- - type: precision_at_1
1382
- value: 30.247
1383
- - type: precision_at_10
1384
- value: 9.522
1385
- - type: precision_at_100
1386
- value: 1.645
1387
- - type: precision_at_1000
1388
- value: 0.22499999999999998
1389
- - type: precision_at_3
1390
- value: 19.136
1391
- - type: precision_at_5
1392
- value: 14.753
1393
- - type: recall_at_1
1394
- value: 15.662999999999998
1395
- - type: recall_at_10
1396
- value: 39.595
1397
- - type: recall_at_100
1398
- value: 66.49199999999999
1399
- - type: recall_at_1000
1400
- value: 87.19
1401
- - type: recall_at_3
1402
- value: 26.346999999999998
1403
- - type: recall_at_5
1404
- value: 32.423
1405
- - task:
1406
- type: Retrieval
1407
- dataset:
1408
- type: hotpotqa
1409
- name: MTEB HotpotQA
1410
- config: default
1411
- split: test
1412
- revision: None
1413
- metrics:
1414
- - type: map_at_1
1415
- value: 30.176
1416
- - type: map_at_10
1417
- value: 42.684
1418
- - type: map_at_100
1419
- value: 43.582
1420
- - type: map_at_1000
1421
- value: 43.668
1422
- - type: map_at_3
1423
- value: 39.964
1424
- - type: map_at_5
1425
- value: 41.589
1426
- - type: mrr_at_1
1427
- value: 60.351
1428
- - type: mrr_at_10
1429
- value: 67.669
1430
- - type: mrr_at_100
1431
- value: 68.089
1432
- - type: mrr_at_1000
1433
- value: 68.111
1434
- - type: mrr_at_3
1435
- value: 66.144
1436
- - type: mrr_at_5
1437
- value: 67.125
1438
- - type: ndcg_at_1
1439
- value: 60.351
1440
- - type: ndcg_at_10
1441
- value: 51.602000000000004
1442
- - type: ndcg_at_100
1443
- value: 55.186
1444
- - type: ndcg_at_1000
1445
- value: 56.96
1446
- - type: ndcg_at_3
1447
- value: 47.251
1448
- - type: ndcg_at_5
1449
- value: 49.584
1450
- - type: precision_at_1
1451
- value: 60.351
1452
- - type: precision_at_10
1453
- value: 10.804
1454
- - type: precision_at_100
1455
- value: 1.3639999999999999
1456
- - type: precision_at_1000
1457
- value: 0.16
1458
- - type: precision_at_3
1459
- value: 29.561
1460
- - type: precision_at_5
1461
- value: 19.581
1462
- - type: recall_at_1
1463
- value: 30.176
1464
- - type: recall_at_10
1465
- value: 54.018
1466
- - type: recall_at_100
1467
- value: 68.22399999999999
1468
- - type: recall_at_1000
1469
- value: 79.97999999999999
1470
- - type: recall_at_3
1471
- value: 44.342
1472
- - type: recall_at_5
1473
- value: 48.953
1474
- - task:
1475
- type: Classification
1476
- dataset:
1477
- type: mteb/imdb
1478
- name: MTEB ImdbClassification
1479
- config: default
1480
- split: test
1481
- revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7
1482
- metrics:
1483
- - type: accuracy
1484
- value: 71.28320000000001
1485
- - type: ap
1486
- value: 65.20730065157146
1487
- - type: f1
1488
- value: 71.19193683354304
1489
- - task:
1490
- type: Retrieval
1491
- dataset:
1492
- type: msmarco
1493
- name: MTEB MSMARCO
1494
- config: default
1495
- split: dev
1496
- revision: None
1497
- metrics:
1498
- - type: map_at_1
1499
- value: 19.686
1500
- - type: map_at_10
1501
- value: 31.189
1502
- - type: map_at_100
1503
- value: 32.368
1504
- - type: map_at_1000
1505
- value: 32.43
1506
- - type: map_at_3
1507
- value: 27.577
1508
- - type: map_at_5
1509
- value: 29.603
1510
- - type: mrr_at_1
1511
- value: 20.201
1512
- - type: mrr_at_10
1513
- value: 31.762
1514
- - type: mrr_at_100
1515
- value: 32.882
1516
- - type: mrr_at_1000
1517
- value: 32.937
1518
- - type: mrr_at_3
1519
- value: 28.177999999999997
1520
- - type: mrr_at_5
1521
- value: 30.212
1522
- - type: ndcg_at_1
1523
- value: 20.215
1524
- - type: ndcg_at_10
1525
- value: 37.730999999999995
1526
- - type: ndcg_at_100
1527
- value: 43.501
1528
- - type: ndcg_at_1000
1529
- value: 45.031
1530
- - type: ndcg_at_3
1531
- value: 30.336000000000002
1532
- - type: ndcg_at_5
1533
- value: 33.961000000000006
1534
- - type: precision_at_1
1535
- value: 20.215
1536
- - type: precision_at_10
1537
- value: 6.036
1538
- - type: precision_at_100
1539
- value: 0.895
1540
- - type: precision_at_1000
1541
- value: 0.10300000000000001
1542
- - type: precision_at_3
1543
- value: 13.028
1544
- - type: precision_at_5
1545
- value: 9.633
1546
- - type: recall_at_1
1547
- value: 19.686
1548
- - type: recall_at_10
1549
- value: 57.867999999999995
1550
- - type: recall_at_100
1551
- value: 84.758
1552
- - type: recall_at_1000
1553
- value: 96.44500000000001
1554
- - type: recall_at_3
1555
- value: 37.726
1556
- - type: recall_at_5
1557
- value: 46.415
1558
- - task:
1559
- type: Classification
1560
- dataset:
1561
- type: mteb/mtop_domain
1562
- name: MTEB MTOPDomainClassification (en)
1563
- config: en
1564
- split: test
1565
- revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
1566
- metrics:
1567
- - type: accuracy
1568
- value: 89.76972184222525
1569
- - type: f1
1570
- value: 89.11949030406099
1571
- - task:
1572
- type: Classification
1573
- dataset:
1574
- type: mteb/mtop_intent
1575
- name: MTEB MTOPIntentClassification (en)
1576
- config: en
1577
- split: test
1578
- revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
1579
- metrics:
1580
- - type: accuracy
1581
- value: 55.57455540355677
1582
- - type: f1
1583
- value: 39.344920096224506
1584
- - task:
1585
- type: Classification
1586
- dataset:
1587
- type: mteb/amazon_massive_intent
1588
- name: MTEB MassiveIntentClassification (en)
1589
- config: en
1590
- split: test
1591
- revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1592
- metrics:
1593
- - type: accuracy
1594
- value: 63.772696704774724
1595
- - type: f1
1596
- value: 60.70041499812703
1597
- - task:
1598
- type: Classification
1599
- dataset:
1600
- type: mteb/amazon_massive_scenario
1601
- name: MTEB MassiveScenarioClassification (en)
1602
- config: en
1603
- split: test
1604
- revision: 7d571f92784cd94a019292a1f45445077d0ef634
1605
- metrics:
1606
- - type: accuracy
1607
- value: 69.16274377942166
1608
- - type: f1
1609
- value: 68.06744012208019
1610
- - task:
1611
- type: Clustering
1612
- dataset:
1613
- type: mteb/medrxiv-clustering-p2p
1614
- name: MTEB MedrxivClusteringP2P
1615
- config: default
1616
- split: test
1617
- revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73
1618
- metrics:
1619
- - type: v_measure
1620
- value: 31.822626760555522
1621
- - task:
1622
- type: Clustering
1623
- dataset:
1624
- type: mteb/medrxiv-clustering-s2s
1625
- name: MTEB MedrxivClusteringS2S
1626
- config: default
1627
- split: test
1628
- revision: 35191c8c0dca72d8ff3efcd72aa802307d469663
1629
- metrics:
1630
- - type: v_measure
1631
- value: 27.98469036402807
1632
- - task:
1633
- type: Reranking
1634
- dataset:
1635
- type: mteb/mind_small
1636
- name: MTEB MindSmallReranking
1637
- config: default
1638
- split: test
1639
- revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69
1640
- metrics:
1641
- - type: map
1642
- value: 30.911144124209166
1643
- - type: mrr
1644
- value: 31.950116175672292
1645
- - task:
1646
- type: Retrieval
1647
- dataset:
1648
- type: nfcorpus
1649
- name: MTEB NFCorpus
1650
- config: default
1651
- split: test
1652
- revision: None
1653
- metrics:
1654
- - type: map_at_1
1655
- value: 5.157
1656
- - type: map_at_10
1657
- value: 11.086
1658
- - type: map_at_100
1659
- value: 13.927
1660
- - type: map_at_1000
1661
- value: 15.226999999999999
1662
- - type: map_at_3
1663
- value: 8.525
1664
- - type: map_at_5
1665
- value: 9.767000000000001
1666
- - type: mrr_at_1
1667
- value: 43.344
1668
- - type: mrr_at_10
1669
- value: 51.646
1670
- - type: mrr_at_100
1671
- value: 52.212
1672
- - type: mrr_at_1000
1673
- value: 52.263999999999996
1674
- - type: mrr_at_3
1675
- value: 50.052
1676
- - type: mrr_at_5
1677
- value: 51.166
1678
- - type: ndcg_at_1
1679
- value: 41.949999999999996
1680
- - type: ndcg_at_10
1681
- value: 30.552
1682
- - type: ndcg_at_100
1683
- value: 28.409000000000002
1684
- - type: ndcg_at_1000
1685
- value: 37.328
1686
- - type: ndcg_at_3
1687
- value: 37.114000000000004
1688
- - type: ndcg_at_5
1689
- value: 34.117999999999995
1690
- - type: precision_at_1
1691
- value: 43.344
1692
- - type: precision_at_10
1693
- value: 22.198
1694
- - type: precision_at_100
1695
- value: 7.234999999999999
1696
- - type: precision_at_1000
1697
- value: 2.013
1698
- - type: precision_at_3
1699
- value: 34.675
1700
- - type: precision_at_5
1701
- value: 29.04
1702
- - type: recall_at_1
1703
- value: 5.157
1704
- - type: recall_at_10
1705
- value: 13.999
1706
- - type: recall_at_100
1707
- value: 28.796
1708
- - type: recall_at_1000
1709
- value: 60.84
1710
- - type: recall_at_3
1711
- value: 9.603
1712
- - type: recall_at_5
1713
- value: 11.638
1714
- - task:
1715
- type: Retrieval
1716
- dataset:
1717
- type: nq
1718
- name: MTEB NQ
1719
- config: default
1720
- split: test
1721
- revision: None
1722
- metrics:
1723
- - type: map_at_1
1724
- value: 33.024
1725
- - type: map_at_10
1726
- value: 47.229
1727
- - type: map_at_100
1728
- value: 48.195
1729
- - type: map_at_1000
1730
- value: 48.229
1731
- - type: map_at_3
1732
- value: 43.356
1733
- - type: map_at_5
1734
- value: 45.857
1735
- - type: mrr_at_1
1736
- value: 36.848
1737
- - type: mrr_at_10
1738
- value: 49.801
1739
- - type: mrr_at_100
1740
- value: 50.532999999999994
1741
- - type: mrr_at_1000
1742
- value: 50.556
1743
- - type: mrr_at_3
1744
- value: 46.605999999999995
1745
- - type: mrr_at_5
1746
- value: 48.735
1747
- - type: ndcg_at_1
1748
- value: 36.848
1749
- - type: ndcg_at_10
1750
- value: 54.202
1751
- - type: ndcg_at_100
1752
- value: 58.436
1753
- - type: ndcg_at_1000
1754
- value: 59.252
1755
- - type: ndcg_at_3
1756
- value: 47.082
1757
- - type: ndcg_at_5
1758
- value: 51.254
1759
- - type: precision_at_1
1760
- value: 36.848
1761
- - type: precision_at_10
1762
- value: 8.636000000000001
1763
- - type: precision_at_100
1764
- value: 1.105
1765
- - type: precision_at_1000
1766
- value: 0.11800000000000001
1767
- - type: precision_at_3
1768
- value: 21.08
1769
- - type: precision_at_5
1770
- value: 15.07
1771
- - type: recall_at_1
1772
- value: 33.024
1773
- - type: recall_at_10
1774
- value: 72.699
1775
- - type: recall_at_100
1776
- value: 91.387
1777
- - type: recall_at_1000
1778
- value: 97.482
1779
- - type: recall_at_3
1780
- value: 54.604
1781
- - type: recall_at_5
1782
- value: 64.224
1783
- - task:
1784
- type: Retrieval
1785
- dataset:
1786
- type: quora
1787
- name: MTEB QuoraRetrieval
1788
- config: default
1789
- split: test
1790
- revision: None
1791
- metrics:
1792
- - type: map_at_1
1793
- value: 69.742
1794
- - type: map_at_10
1795
- value: 83.43
1796
- - type: map_at_100
1797
- value: 84.09400000000001
1798
- - type: map_at_1000
1799
- value: 84.113
1800
- - type: map_at_3
1801
- value: 80.464
1802
- - type: map_at_5
1803
- value: 82.356
1804
- - type: mrr_at_1
1805
- value: 80.31
1806
- - type: mrr_at_10
1807
- value: 86.629
1808
- - type: mrr_at_100
1809
- value: 86.753
1810
- - type: mrr_at_1000
1811
- value: 86.75399999999999
1812
- - type: mrr_at_3
1813
- value: 85.59
1814
- - type: mrr_at_5
1815
- value: 86.346
1816
- - type: ndcg_at_1
1817
- value: 80.28999999999999
1818
- - type: ndcg_at_10
1819
- value: 87.323
1820
- - type: ndcg_at_100
1821
- value: 88.682
1822
- - type: ndcg_at_1000
1823
- value: 88.812
1824
- - type: ndcg_at_3
1825
- value: 84.373
1826
- - type: ndcg_at_5
1827
- value: 86.065
1828
- - type: precision_at_1
1829
- value: 80.28999999999999
1830
- - type: precision_at_10
1831
- value: 13.239999999999998
1832
- - type: precision_at_100
1833
- value: 1.521
1834
- - type: precision_at_1000
1835
- value: 0.156
1836
- - type: precision_at_3
1837
- value: 36.827
1838
- - type: precision_at_5
1839
- value: 24.272
1840
- - type: recall_at_1
1841
- value: 69.742
1842
- - type: recall_at_10
1843
- value: 94.645
1844
- - type: recall_at_100
1845
- value: 99.375
1846
- - type: recall_at_1000
1847
- value: 99.97200000000001
1848
- - type: recall_at_3
1849
- value: 86.18400000000001
1850
- - type: recall_at_5
1851
- value: 90.958
1852
- - task:
1853
- type: Clustering
1854
- dataset:
1855
- type: mteb/reddit-clustering
1856
- name: MTEB RedditClustering
1857
- config: default
1858
- split: test
1859
- revision: 24640382cdbf8abc73003fb0fa6d111a705499eb
1860
- metrics:
1861
- - type: v_measure
1862
- value: 50.52987829115787
1863
- - task:
1864
- type: Clustering
1865
- dataset:
1866
- type: mteb/reddit-clustering-p2p
1867
- name: MTEB RedditClusteringP2P
1868
- config: default
1869
- split: test
1870
- revision: 282350215ef01743dc01b456c7f5241fa8937f16
1871
- metrics:
1872
- - type: v_measure
1873
- value: 56.73289360025561
1874
- - task:
1875
- type: Retrieval
1876
- dataset:
1877
- type: scidocs
1878
- name: MTEB SCIDOCS
1879
- config: default
1880
- split: test
1881
- revision: None
1882
- metrics:
1883
- - type: map_at_1
1884
- value: 4.473
1885
- - type: map_at_10
1886
- value: 10.953
1887
- - type: map_at_100
1888
- value: 12.842
1889
- - type: map_at_1000
1890
- value: 13.122
1891
- - type: map_at_3
1892
- value: 7.863
1893
- - type: map_at_5
1894
- value: 9.376
1895
- - type: mrr_at_1
1896
- value: 22.0
1897
- - type: mrr_at_10
1898
- value: 32.639
1899
- - type: mrr_at_100
1900
- value: 33.658
1901
- - type: mrr_at_1000
1902
- value: 33.727000000000004
1903
- - type: mrr_at_3
1904
- value: 29.232999999999997
1905
- - type: mrr_at_5
1906
- value: 31.373
1907
- - type: ndcg_at_1
1908
- value: 22.0
1909
- - type: ndcg_at_10
1910
- value: 18.736
1911
- - type: ndcg_at_100
1912
- value: 26.209
1913
- - type: ndcg_at_1000
1914
- value: 31.427
1915
- - type: ndcg_at_3
1916
- value: 17.740000000000002
1917
- - type: ndcg_at_5
1918
- value: 15.625
1919
- - type: precision_at_1
1920
- value: 22.0
1921
- - type: precision_at_10
1922
- value: 9.700000000000001
1923
- - type: precision_at_100
1924
- value: 2.052
1925
- - type: precision_at_1000
1926
- value: 0.331
1927
- - type: precision_at_3
1928
- value: 16.533
1929
- - type: precision_at_5
1930
- value: 13.74
1931
- - type: recall_at_1
1932
- value: 4.473
1933
- - type: recall_at_10
1934
- value: 19.627
1935
- - type: recall_at_100
1936
- value: 41.63
1937
- - type: recall_at_1000
1938
- value: 67.173
1939
- - type: recall_at_3
1940
- value: 10.067
1941
- - type: recall_at_5
1942
- value: 13.927
1943
- - task:
1944
- type: STS
1945
- dataset:
1946
- type: mteb/sickr-sts
1947
- name: MTEB SICK-R
1948
- config: default
1949
- split: test
1950
- revision: a6ea5a8cab320b040a23452cc28066d9beae2cee
1951
- metrics:
1952
- - type: cos_sim_pearson
1953
- value: 83.27314719076216
1954
- - type: cos_sim_spearman
1955
- value: 76.39295628838427
1956
- - type: euclidean_pearson
1957
- value: 80.38849931283136
1958
- - type: euclidean_spearman
1959
- value: 76.39295685543406
1960
- - type: manhattan_pearson
1961
- value: 80.28382869912794
1962
- - type: manhattan_spearman
1963
- value: 76.28362123227473
1964
- - task:
1965
- type: STS
1966
- dataset:
1967
- type: mteb/sts12-sts
1968
- name: MTEB STS12
1969
- config: default
1970
- split: test
1971
- revision: a0d554a64d88156834ff5ae9920b964011b16384
1972
- metrics:
1973
- - type: cos_sim_pearson
1974
- value: 82.36858074786585
1975
- - type: cos_sim_spearman
1976
- value: 72.81528838052759
1977
- - type: euclidean_pearson
1978
- value: 78.83576324502302
1979
- - type: euclidean_spearman
1980
- value: 72.8152880167174
1981
- - type: manhattan_pearson
1982
- value: 78.81284819385367
1983
- - type: manhattan_spearman
1984
- value: 72.76091465928633
1985
- - task:
1986
- type: STS
1987
- dataset:
1988
- type: mteb/sts13-sts
1989
- name: MTEB STS13
1990
- config: default
1991
- split: test
1992
- revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca
1993
- metrics:
1994
- - type: cos_sim_pearson
1995
- value: 81.08132718998489
1996
- - type: cos_sim_spearman
1997
- value: 82.00988939015869
1998
- - type: euclidean_pearson
1999
- value: 81.02243847451692
2000
- - type: euclidean_spearman
2001
- value: 82.00992010206836
2002
- - type: manhattan_pearson
2003
- value: 80.97749306075134
2004
- - type: manhattan_spearman
2005
- value: 81.97800195109437
2006
- - task:
2007
- type: STS
2008
- dataset:
2009
- type: mteb/sts14-sts
2010
- name: MTEB STS14
2011
- config: default
2012
- split: test
2013
- revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375
2014
- metrics:
2015
- - type: cos_sim_pearson
2016
- value: 80.83442047735284
2017
- - type: cos_sim_spearman
2018
- value: 77.50930325127395
2019
- - type: euclidean_pearson
2020
- value: 79.34941050260747
2021
- - type: euclidean_spearman
2022
- value: 77.50930324686452
2023
- - type: manhattan_pearson
2024
- value: 79.28081079289419
2025
- - type: manhattan_spearman
2026
- value: 77.42311420628891
2027
- - task:
2028
- type: STS
2029
- dataset:
2030
- type: mteb/sts15-sts
2031
- name: MTEB STS15
2032
- config: default
2033
- split: test
2034
- revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3
2035
- metrics:
2036
- - type: cos_sim_pearson
2037
- value: 85.70132781546333
2038
- - type: cos_sim_spearman
2039
- value: 86.58415907086527
2040
- - type: euclidean_pearson
2041
- value: 85.63892869817083
2042
- - type: euclidean_spearman
2043
- value: 86.58415907086527
2044
- - type: manhattan_pearson
2045
- value: 85.56054168116064
2046
- - type: manhattan_spearman
2047
- value: 86.50292824173809
2048
- - task:
2049
- type: STS
2050
- dataset:
2051
- type: mteb/sts16-sts
2052
- name: MTEB STS16
2053
- config: default
2054
- split: test
2055
- revision: 4d8694f8f0e0100860b497b999b3dbed754a0513
2056
- metrics:
2057
- - type: cos_sim_pearson
2058
- value: 81.48780971731246
2059
- - type: cos_sim_spearman
2060
- value: 82.79818891852887
2061
- - type: euclidean_pearson
2062
- value: 81.93990926192305
2063
- - type: euclidean_spearman
2064
- value: 82.79818891852887
2065
- - type: manhattan_pearson
2066
- value: 81.97538189750966
2067
- - type: manhattan_spearman
2068
- value: 82.88761825524075
2069
- - task:
2070
- type: STS
2071
- dataset:
2072
- type: mteb/sts17-crosslingual-sts
2073
- name: MTEB STS17 (en-en)
2074
- config: en-en
2075
- split: test
2076
- revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d
2077
- metrics:
2078
- - type: cos_sim_pearson
2079
- value: 88.4989925729811
2080
- - type: cos_sim_spearman
2081
- value: 88.47370962620529
2082
- - type: euclidean_pearson
2083
- value: 88.2312980339956
2084
- - type: euclidean_spearman
2085
- value: 88.47370962620529
2086
- - type: manhattan_pearson
2087
- value: 88.15570940509707
2088
- - type: manhattan_spearman
2089
- value: 88.36900000569275
2090
- - task:
2091
- type: STS
2092
- dataset:
2093
- type: mteb/sts22-crosslingual-sts
2094
- name: MTEB STS22 (en)
2095
- config: en
2096
- split: test
2097
- revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80
2098
- metrics:
2099
- - type: cos_sim_pearson
2100
- value: 63.90740805015967
2101
- - type: cos_sim_spearman
2102
- value: 63.968359064784444
2103
- - type: euclidean_pearson
2104
- value: 64.67928113832794
2105
- - type: euclidean_spearman
2106
- value: 63.968359064784444
2107
- - type: manhattan_pearson
2108
- value: 63.92597430517486
2109
- - type: manhattan_spearman
2110
- value: 63.31372007361158
2111
- - task:
2112
- type: STS
2113
- dataset:
2114
- type: mteb/stsbenchmark-sts
2115
- name: MTEB STSBenchmark
2116
- config: default
2117
- split: test
2118
- revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831
2119
- metrics:
2120
- - type: cos_sim_pearson
2121
- value: 82.56902991447632
2122
- - type: cos_sim_spearman
2123
- value: 83.16262853325924
2124
- - type: euclidean_pearson
2125
- value: 83.47693312869555
2126
- - type: euclidean_spearman
2127
- value: 83.16266829656969
2128
- - type: manhattan_pearson
2129
- value: 83.51067558632968
2130
- - type: manhattan_spearman
2131
- value: 83.25136388306153
2132
- - task:
2133
- type: Reranking
2134
- dataset:
2135
- type: mteb/scidocs-reranking
2136
- name: MTEB SciDocsRR
2137
- config: default
2138
- split: test
2139
- revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab
2140
- metrics:
2141
- - type: map
2142
- value: 80.1518040851234
2143
- - type: mrr
2144
- value: 94.49083052024228
2145
- - task:
2146
- type: Retrieval
2147
- dataset:
2148
- type: scifact
2149
- name: MTEB SciFact
2150
- config: default
2151
- split: test
2152
- revision: None
2153
- metrics:
2154
- - type: map_at_1
2155
- value: 50.661
2156
- - type: map_at_10
2157
- value: 59.816
2158
- - type: map_at_100
2159
- value: 60.412
2160
- - type: map_at_1000
2161
- value: 60.446999999999996
2162
- - type: map_at_3
2163
- value: 56.567
2164
- - type: map_at_5
2165
- value: 58.45
2166
- - type: mrr_at_1
2167
- value: 53.667
2168
- - type: mrr_at_10
2169
- value: 61.342
2170
- - type: mrr_at_100
2171
- value: 61.8
2172
- - type: mrr_at_1000
2173
- value: 61.836
2174
- - type: mrr_at_3
2175
- value: 59.111000000000004
2176
- - type: mrr_at_5
2177
- value: 60.411
2178
- - type: ndcg_at_1
2179
- value: 53.667
2180
- - type: ndcg_at_10
2181
- value: 64.488
2182
- - type: ndcg_at_100
2183
- value: 67.291
2184
- - type: ndcg_at_1000
2185
- value: 68.338
2186
- - type: ndcg_at_3
2187
- value: 59.101000000000006
2188
- - type: ndcg_at_5
2189
- value: 61.812999999999995
2190
- - type: precision_at_1
2191
- value: 53.667
2192
- - type: precision_at_10
2193
- value: 8.799999999999999
2194
- - type: precision_at_100
2195
- value: 1.0330000000000001
2196
- - type: precision_at_1000
2197
- value: 0.11199999999999999
2198
- - type: precision_at_3
2199
- value: 23.0
2200
- - type: precision_at_5
2201
- value: 15.6
2202
- - type: recall_at_1
2203
- value: 50.661
2204
- - type: recall_at_10
2205
- value: 77.422
2206
- - type: recall_at_100
2207
- value: 90.667
2208
- - type: recall_at_1000
2209
- value: 99.0
2210
- - type: recall_at_3
2211
- value: 63.144
2212
- - type: recall_at_5
2213
- value: 69.817
2214
- - task:
2215
- type: PairClassification
2216
- dataset:
2217
- type: mteb/sprintduplicatequestions-pairclassification
2218
- name: MTEB SprintDuplicateQuestions
2219
- config: default
2220
- split: test
2221
- revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46
2222
- metrics:
2223
- - type: cos_sim_accuracy
2224
- value: 99.81287128712871
2225
- - type: cos_sim_ap
2226
- value: 94.91998708151321
2227
- - type: cos_sim_f1
2228
- value: 90.36206017338093
2229
- - type: cos_sim_precision
2230
- value: 92.19562955254943
2231
- - type: cos_sim_recall
2232
- value: 88.6
2233
- - type: dot_accuracy
2234
- value: 99.81287128712871
2235
- - type: dot_ap
2236
- value: 94.91998708151321
2237
- - type: dot_f1
2238
- value: 90.36206017338093
2239
- - type: dot_precision
2240
- value: 92.19562955254943
2241
- - type: dot_recall
2242
- value: 88.6
2243
- - type: euclidean_accuracy
2244
- value: 99.81287128712871
2245
- - type: euclidean_ap
2246
- value: 94.9199944407842
2247
- - type: euclidean_f1
2248
- value: 90.36206017338093
2249
- - type: euclidean_precision
2250
- value: 92.19562955254943
2251
- - type: euclidean_recall
2252
- value: 88.6
2253
- - type: manhattan_accuracy
2254
- value: 99.8108910891089
2255
- - type: manhattan_ap
2256
- value: 94.83783896670839
2257
- - type: manhattan_f1
2258
- value: 90.27989821882952
2259
- - type: manhattan_precision
2260
- value: 91.91709844559585
2261
- - type: manhattan_recall
2262
- value: 88.7
2263
- - type: max_accuracy
2264
- value: 99.81287128712871
2265
- - type: max_ap
2266
- value: 94.9199944407842
2267
- - type: max_f1
2268
- value: 90.36206017338093
2269
- - task:
2270
- type: Clustering
2271
- dataset:
2272
- type: mteb/stackexchange-clustering
2273
- name: MTEB StackExchangeClustering
2274
- config: default
2275
- split: test
2276
- revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259
2277
- metrics:
2278
- - type: v_measure
2279
- value: 56.165546412944714
2280
- - task:
2281
- type: Clustering
2282
- dataset:
2283
- type: mteb/stackexchange-clustering-p2p
2284
- name: MTEB StackExchangeClusteringP2P
2285
- config: default
2286
- split: test
2287
- revision: 815ca46b2622cec33ccafc3735d572c266efdb44
2288
- metrics:
2289
- - type: v_measure
2290
- value: 34.19894321136813
2291
- - task:
2292
- type: Reranking
2293
- dataset:
2294
- type: mteb/stackoverflowdupquestions-reranking
2295
- name: MTEB StackOverflowDupQuestions
2296
- config: default
2297
- split: test
2298
- revision: e185fbe320c72810689fc5848eb6114e1ef5ec69
2299
- metrics:
2300
- - type: map
2301
- value: 50.02944308369115
2302
- - type: mrr
2303
- value: 50.63055714710127
2304
- - task:
2305
- type: Summarization
2306
- dataset:
2307
- type: mteb/summeval
2308
- name: MTEB SummEval
2309
- config: default
2310
- split: test
2311
- revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c
2312
- metrics:
2313
- - type: cos_sim_pearson
2314
- value: 31.3377433394579
2315
- - type: cos_sim_spearman
2316
- value: 30.877807383527983
2317
- - type: dot_pearson
2318
- value: 31.337752376327405
2319
- - type: dot_spearman
2320
- value: 30.877807383527983
2321
- - task:
2322
- type: Retrieval
2323
- dataset:
2324
- type: trec-covid
2325
- name: MTEB TRECCOVID
2326
- config: default
2327
- split: test
2328
- revision: None
2329
- metrics:
2330
- - type: map_at_1
2331
- value: 0.20500000000000002
2332
- - type: map_at_10
2333
- value: 1.6099999999999999
2334
- - type: map_at_100
2335
- value: 8.635
2336
- - type: map_at_1000
2337
- value: 20.419999999999998
2338
- - type: map_at_3
2339
- value: 0.59
2340
- - type: map_at_5
2341
- value: 0.9249999999999999
2342
- - type: mrr_at_1
2343
- value: 80.0
2344
- - type: mrr_at_10
2345
- value: 88.452
2346
- - type: mrr_at_100
2347
- value: 88.452
2348
- - type: mrr_at_1000
2349
- value: 88.452
2350
- - type: mrr_at_3
2351
- value: 87.667
2352
- - type: mrr_at_5
2353
- value: 88.167
2354
- - type: ndcg_at_1
2355
- value: 77.0
2356
- - type: ndcg_at_10
2357
- value: 67.079
2358
- - type: ndcg_at_100
2359
- value: 49.937
2360
- - type: ndcg_at_1000
2361
- value: 44.031
2362
- - type: ndcg_at_3
2363
- value: 73.123
2364
- - type: ndcg_at_5
2365
- value: 70.435
2366
- - type: precision_at_1
2367
- value: 80.0
2368
- - type: precision_at_10
2369
- value: 70.39999999999999
2370
- - type: precision_at_100
2371
- value: 51.25999999999999
2372
- - type: precision_at_1000
2373
- value: 19.698
2374
- - type: precision_at_3
2375
- value: 78.0
2376
- - type: precision_at_5
2377
- value: 75.2
2378
- - type: recall_at_1
2379
- value: 0.20500000000000002
2380
- - type: recall_at_10
2381
- value: 1.8399999999999999
2382
- - type: recall_at_100
2383
- value: 11.971
2384
- - type: recall_at_1000
2385
- value: 41.042
2386
- - type: recall_at_3
2387
- value: 0.632
2388
- - type: recall_at_5
2389
- value: 1.008
2390
- - task:
2391
- type: Retrieval
2392
- dataset:
2393
- type: webis-touche2020
2394
- name: MTEB Touche2020
2395
- config: default
2396
- split: test
2397
- revision: None
2398
- metrics:
2399
- - type: map_at_1
2400
- value: 1.183
2401
- - type: map_at_10
2402
- value: 9.58
2403
- - type: map_at_100
2404
- value: 16.27
2405
- - type: map_at_1000
2406
- value: 17.977999999999998
2407
- - type: map_at_3
2408
- value: 4.521
2409
- - type: map_at_5
2410
- value: 6.567
2411
- - type: mrr_at_1
2412
- value: 12.245000000000001
2413
- - type: mrr_at_10
2414
- value: 33.486
2415
- - type: mrr_at_100
2416
- value: 34.989
2417
- - type: mrr_at_1000
2418
- value: 34.989
2419
- - type: mrr_at_3
2420
- value: 28.231
2421
- - type: mrr_at_5
2422
- value: 31.701
2423
- - type: ndcg_at_1
2424
- value: 9.184000000000001
2425
- - type: ndcg_at_10
2426
- value: 22.133
2427
- - type: ndcg_at_100
2428
- value: 36.882
2429
- - type: ndcg_at_1000
2430
- value: 48.487
2431
- - type: ndcg_at_3
2432
- value: 18.971
2433
- - type: ndcg_at_5
2434
- value: 20.107
2435
- - type: precision_at_1
2436
- value: 12.245000000000001
2437
- - type: precision_at_10
2438
- value: 21.837
2439
- - type: precision_at_100
2440
- value: 8.265
2441
- - type: precision_at_1000
2442
- value: 1.606
2443
- - type: precision_at_3
2444
- value: 22.448999999999998
2445
- - type: precision_at_5
2446
- value: 23.265
2447
- - type: recall_at_1
2448
- value: 1.183
2449
- - type: recall_at_10
2450
- value: 17.01
2451
- - type: recall_at_100
2452
- value: 51.666000000000004
2453
- - type: recall_at_1000
2454
- value: 87.56
2455
- - type: recall_at_3
2456
- value: 6.0280000000000005
2457
- - type: recall_at_5
2458
- value: 9.937999999999999
2459
- - task:
2460
- type: Classification
2461
- dataset:
2462
- type: mteb/toxic_conversations_50k
2463
- name: MTEB ToxicConversationsClassification
2464
- config: default
2465
- split: test
2466
- revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c
2467
- metrics:
2468
- - type: accuracy
2469
- value: 70.6812
2470
- - type: ap
2471
- value: 13.776718216594006
2472
- - type: f1
2473
- value: 54.14269849375851
2474
- - task:
2475
- type: Classification
2476
- dataset:
2477
- type: mteb/tweet_sentiment_extraction
2478
- name: MTEB TweetSentimentExtractionClassification
2479
- config: default
2480
- split: test
2481
- revision: d604517c81ca91fe16a244d1248fc021f9ecee7a
2482
- metrics:
2483
- - type: accuracy
2484
- value: 57.3372948500283
2485
- - type: f1
2486
- value: 57.39381291375
2487
- - task:
2488
- type: Clustering
2489
- dataset:
2490
- type: mteb/twentynewsgroups-clustering
2491
- name: MTEB TwentyNewsgroupsClustering
2492
- config: default
2493
- split: test
2494
- revision: 6125ec4e24fa026cec8a478383ee943acfbd5449
2495
- metrics:
2496
- - type: v_measure
2497
- value: 41.49681931876514
2498
- - task:
2499
- type: PairClassification
2500
- dataset:
2501
- type: mteb/twittersemeval2015-pairclassification
2502
- name: MTEB TwitterSemEval2015
2503
- config: default
2504
- split: test
2505
- revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1
2506
- metrics:
2507
- - type: cos_sim_accuracy
2508
- value: 84.65756690707516
2509
- - type: cos_sim_ap
2510
- value: 70.06190309300052
2511
- - type: cos_sim_f1
2512
- value: 65.49254432311848
2513
- - type: cos_sim_precision
2514
- value: 59.00148085466469
2515
- - type: cos_sim_recall
2516
- value: 73.58839050131925
2517
- - type: dot_accuracy
2518
- value: 84.65756690707516
2519
- - type: dot_ap
2520
- value: 70.06187157356817
2521
- - type: dot_f1
2522
- value: 65.49254432311848
2523
- - type: dot_precision
2524
- value: 59.00148085466469
2525
- - type: dot_recall
2526
- value: 73.58839050131925
2527
- - type: euclidean_accuracy
2528
- value: 84.65756690707516
2529
- - type: euclidean_ap
2530
- value: 70.06190439203068
2531
- - type: euclidean_f1
2532
- value: 65.49254432311848
2533
- - type: euclidean_precision
2534
- value: 59.00148085466469
2535
- - type: euclidean_recall
2536
- value: 73.58839050131925
2537
- - type: manhattan_accuracy
2538
- value: 84.58604041246946
2539
- - type: manhattan_ap
2540
- value: 69.93103436414437
2541
- - type: manhattan_f1
2542
- value: 65.48780487804878
2543
- - type: manhattan_precision
2544
- value: 60.8843537414966
2545
- - type: manhattan_recall
2546
- value: 70.84432717678101
2547
- - type: max_accuracy
2548
- value: 84.65756690707516
2549
- - type: max_ap
2550
- value: 70.06190439203068
2551
- - type: max_f1
2552
- value: 65.49254432311848
2553
- - task:
2554
- type: PairClassification
2555
- dataset:
2556
- type: mteb/twitterurlcorpus-pairclassification
2557
- name: MTEB TwitterURLCorpus
2558
- config: default
2559
- split: test
2560
- revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf
2561
- metrics:
2562
- - type: cos_sim_accuracy
2563
- value: 88.78410369852912
2564
- - type: cos_sim_ap
2565
- value: 85.45825760499459
2566
- - type: cos_sim_f1
2567
- value: 77.73455035163849
2568
- - type: cos_sim_precision
2569
- value: 75.5966239813737
2570
- - type: cos_sim_recall
2571
- value: 79.9969202340622
2572
- - type: dot_accuracy
2573
- value: 88.78410369852912
2574
- - type: dot_ap
2575
- value: 85.45825790635979
2576
- - type: dot_f1
2577
- value: 77.73455035163849
2578
- - type: dot_precision
2579
- value: 75.5966239813737
2580
- - type: dot_recall
2581
- value: 79.9969202340622
2582
- - type: euclidean_accuracy
2583
- value: 88.78410369852912
2584
- - type: euclidean_ap
2585
- value: 85.45826341243391
2586
- - type: euclidean_f1
2587
- value: 77.73455035163849
2588
- - type: euclidean_precision
2589
- value: 75.5966239813737
2590
- - type: euclidean_recall
2591
- value: 79.9969202340622
2592
- - type: manhattan_accuracy
2593
- value: 88.7026041060271
2594
- - type: manhattan_ap
2595
- value: 85.43182830781821
2596
- - type: manhattan_f1
2597
- value: 77.61487303506651
2598
- - type: manhattan_precision
2599
- value: 76.20955773226477
2600
- - type: manhattan_recall
2601
- value: 79.07299045272559
2602
- - type: max_accuracy
2603
- value: 88.78410369852912
2604
- - type: max_ap
2605
- value: 85.45826341243391
2606
- - type: max_f1
2607
- value: 77.73455035163849
2608
- ---
 
2
  pipeline_tag: sentence-similarity
3
  tags:
4
  - finetuner
5
+ - mteb
6
  - sentence-transformers
7
  - feature-extraction
8
  - sentence-similarity
9
+ - alibi
10
  datasets:
11
+ - allenai/c4
12
  language: en
13
  license: apache-2.0
14
  model-index:
15
+ - name: jina-embedding-s-en-v2
16
+ results: []
17
+ ---
18
+ <!-- TODO: add evaluation results here -->
19
+ <br><br>
20
+
21
+ <p align="center">
22
+ <img src="https://github.com/jina-ai/finetuner/blob/main/docs/_static/finetuner-logo-ani.svg?raw=true" alt="Finetuner logo: Finetuner helps you to create experiments in order to improve embeddings on search tasks. It accompanies you to deliver the last mile of performance-tuning for neural search applications." width="150px">
23
+ </p>
24
+
25
+
26
+ <p align="center">
27
+ <b>The text embedding set trained by <a href="https://jina.ai/"><b>Jina AI</b></a>, <a href="https://github.com/jina-ai/finetuner"><b>Finetuner</b></a> team.</b>
28
+ </p>
29
+
30
+
31
+ ## Intended Usage & Model Info
32
+
33
+ `jina-embedding-s-en-v2` is an English, monolingual embedding model supporting 8k sequence length.
34
+ It is based on a Bert architecture that supports the symmetric bidirectional variant of ALiBi to support longer sequence length.
35
+ The backbone Jina Bert Small model is pretrained on the C4 dataset.
36
+ The model is further trained on Jina AI's collection of more than 40 datasets of sentence pairs and hard negatives.
37
+ These pairs were obtained from various domains and were carefully selected through a thorough cleaning process.
38
+
39
+ The embedding model was trained using 512 sequence length, but extrapolates to 8k sequence length thanks to ALiBi.
40
+ This makes our model useful for a range of use cases, especially when processing long documents is needed, including long document retrieval, semantic textual similarity, text reranking, recommendation, RAG and LLM-based generative search,...
41
+
42
+ This model has 33 million parameters, which enables lightning-fast and memory efficient inference on long documents, while still delivering impressive performance.
43
+ Additionally, we provide the following embedding models, supporting 8k sequence length as well:
44
+
45
+ - [`jina-embedding-s-en-v2`](https://huggingface.co/jinaai/jina-embedding-s-en-v2): 33 million parameters **(you are here)**.
46
+ - [`jina-embedding-b-en-v2`](https://huggingface.co/jinaai/jina-embedding-b-en-v2): 137 million parameters.
47
+ - [`jina-embedding-l-en-v2`](https://huggingface.co/jinaai/jina-embedding-l-en-v2): 435 million parameters.
48
+
49
+ ## Data & Parameters
50
+
51
+ Please checkout our [technical blog](https://arxiv.org/abs/2307.11224).
52
+
53
+ ## Metrics
54
+
55
+ We compared the model against `all-minilm-l6-v2`/`all-mpnet-base-v2` from sbert and `text-embeddings-ada-002` from OpenAI:
56
+
57
+ <!-- TODO: add evaluation table here -->
58
+
59
+ ## Usage
60
+
61
+ You can use Jina Embedding models directly from transformers package:
62
+ ```python
63
+ !pip install transformers
64
+ from transformers import AutoModel
65
+ from numpy.linalg import norm
66
+
67
+ cos_sim = lambda a,b: (a @ b.T) / (norm(a)*norm(b))
68
+ model = AutoModel.from_pretrained('jinaai/jina-embedding-s-en-v2', trust_remote_code=True) # trust_remote_code is needed to use the encode method
69
+ embeddings = model.encode(['How is the weather today?', 'What is the current weather like today?'])
70
+ print(cos_sim(embeddings[0], embeddings[1]))
71
+ ```
72
+
73
+ For long sequences, it's recommended to perform inference using Flash Attention. Using Flash Attention allows you to increase the batch size and throughput for long sequence length.
74
+ We include an experimental implementation for Flash Attention, shipped with the model.
75
+ Install the following triton version:
76
+ `pip install triton==2.0.0.dev20221202`.
77
+ Now run the same code above, but make sure to set the parameter `with_flash` to `True` when you load the model. You also have to use either `fp16` or `bf16`:
78
+ ```python
79
+ from transformers import AutoModel
80
+ from numpy.linalg import norm
81
+ import torch
82
+
83
+ cos_sim = lambda a,b: (a @ b.T) / (norm(a)*norm(b))
84
+ model = AutoModel.from_pretrained('jinaai/jina-embedding-s-en-v2', trust_remote_code=True, with_flash=True, torch_dtype=torch.float16).cuda() # trust_remote_code is needed to use the encode method
85
+ embeddings = model.encode(['How is the weather today?', 'What is the current weather like today?'])
86
+ print(cos_sim(embeddings[0], embeddings[1]))
87
+ ```
88
+
89
+ ## Fine-tuning
90
+
91
+ Please consider [Finetuner](https://github.com/jina-ai/finetuner).
92
+
93
+ ## Plans
94
+ The development of new multilingual models is currently underway. We will be targeting mainly the German and Spanish languages. The upcoming models will be called `jina-embedding-s/b/l-de/es-v2`.
95
+
96
+ ## Contact
97
+
98
+ Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.
99
+
100
+ ## Citation
101
+
102
+ If you find Jina Embeddings useful in your research, please cite the following paper:
103
+
104
+ <!-- TODO: update the paper ID once it is published on arxiv -->
105
+ ``` latex
106
+ @misc{günther2023jina,
107
+ title={Beyond the 512-Token Barrier: Training General-Purpose Text
108
+ Embeddings for Large Documents},
109
+ author={Michael Günther and Jackmin Ong and Isabelle Mohr and Alaeddine Abdessalem and Tanguy Abel and Mohammad Kalim Akram and Susana Guzman and Georgios Mastrapas and Saba Sturua and Bo Wang},
110
+ year={2023},
111
+ eprint={2307.11224},
112
+ archivePrefix={arXiv},
113
+ primaryClass={cs.CL}
114
+ }
115
+ ```