michaelfeil commited on
Commit
b6f0cde
1 Parent(s): 3e84a86

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -3022
README.md CHANGED
@@ -4,2940 +4,24 @@ tags:
4
  - feature-extraction
5
  - sentence-similarity
6
  - transformers
7
- - mteb
8
- model-index:
9
- - name: bge-small-en-v1.5
10
- results:
11
- - task:
12
- type: Classification
13
- dataset:
14
- type: mteb/amazon_counterfactual
15
- name: MTEB AmazonCounterfactualClassification (en)
16
- config: en
17
- split: test
18
- revision: e8379541af4e31359cca9fbcf4b00f2671dba205
19
- metrics:
20
- - type: accuracy
21
- value: 73.79104477611939
22
- - type: ap
23
- value: 37.21923821573361
24
- - type: f1
25
- value: 68.0914945617093
26
- - task:
27
- type: Classification
28
- dataset:
29
- type: mteb/amazon_polarity
30
- name: MTEB AmazonPolarityClassification
31
- config: default
32
- split: test
33
- revision: e2d317d38cd51312af73b3d32a06d1a08b442046
34
- metrics:
35
- - type: accuracy
36
- value: 92.75377499999999
37
- - type: ap
38
- value: 89.46766124546022
39
- - type: f1
40
- value: 92.73884001331487
41
- - task:
42
- type: Classification
43
- dataset:
44
- type: mteb/amazon_reviews_multi
45
- name: MTEB AmazonReviewsClassification (en)
46
- config: en
47
- split: test
48
- revision: 1399c76144fd37290681b995c656ef9b2e06e26d
49
- metrics:
50
- - type: accuracy
51
- value: 46.986
52
- - type: f1
53
- value: 46.55936786727896
54
- - task:
55
- type: Retrieval
56
- dataset:
57
- type: arguana
58
- name: MTEB ArguAna
59
- config: default
60
- split: test
61
- revision: None
62
- metrics:
63
- - type: map_at_1
64
- value: 35.846000000000004
65
- - type: map_at_10
66
- value: 51.388
67
- - type: map_at_100
68
- value: 52.132999999999996
69
- - type: map_at_1000
70
- value: 52.141000000000005
71
- - type: map_at_3
72
- value: 47.037
73
- - type: map_at_5
74
- value: 49.579
75
- - type: mrr_at_1
76
- value: 36.558
77
- - type: mrr_at_10
78
- value: 51.658
79
- - type: mrr_at_100
80
- value: 52.402
81
- - type: mrr_at_1000
82
- value: 52.410000000000004
83
- - type: mrr_at_3
84
- value: 47.345
85
- - type: mrr_at_5
86
- value: 49.797999999999995
87
- - type: ndcg_at_1
88
- value: 35.846000000000004
89
- - type: ndcg_at_10
90
- value: 59.550000000000004
91
- - type: ndcg_at_100
92
- value: 62.596
93
- - type: ndcg_at_1000
94
- value: 62.759
95
- - type: ndcg_at_3
96
- value: 50.666999999999994
97
- - type: ndcg_at_5
98
- value: 55.228
99
- - type: precision_at_1
100
- value: 35.846000000000004
101
- - type: precision_at_10
102
- value: 8.542
103
- - type: precision_at_100
104
- value: 0.984
105
- - type: precision_at_1000
106
- value: 0.1
107
- - type: precision_at_3
108
- value: 20.389
109
- - type: precision_at_5
110
- value: 14.438
111
- - type: recall_at_1
112
- value: 35.846000000000004
113
- - type: recall_at_10
114
- value: 85.42
115
- - type: recall_at_100
116
- value: 98.43499999999999
117
- - type: recall_at_1000
118
- value: 99.644
119
- - type: recall_at_3
120
- value: 61.166
121
- - type: recall_at_5
122
- value: 72.191
123
- - task:
124
- type: Clustering
125
- dataset:
126
- type: mteb/arxiv-clustering-p2p
127
- name: MTEB ArxivClusteringP2P
128
- config: default
129
- split: test
130
- revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d
131
- metrics:
132
- - type: v_measure
133
- value: 47.402770198163594
134
- - task:
135
- type: Clustering
136
- dataset:
137
- type: mteb/arxiv-clustering-s2s
138
- name: MTEB ArxivClusteringS2S
139
- config: default
140
- split: test
141
- revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53
142
- metrics:
143
- - type: v_measure
144
- value: 40.01545436974177
145
- - task:
146
- type: Reranking
147
- dataset:
148
- type: mteb/askubuntudupquestions-reranking
149
- name: MTEB AskUbuntuDupQuestions
150
- config: default
151
- split: test
152
- revision: 2000358ca161889fa9c082cb41daa8dcfb161a54
153
- metrics:
154
- - type: map
155
- value: 62.586465273207196
156
- - type: mrr
157
- value: 74.42169019038825
158
- - task:
159
- type: STS
160
- dataset:
161
- type: mteb/biosses-sts
162
- name: MTEB BIOSSES
163
- config: default
164
- split: test
165
- revision: d3fb88f8f02e40887cd149695127462bbcf29b4a
166
- metrics:
167
- - type: cos_sim_pearson
168
- value: 85.1891186537969
169
- - type: cos_sim_spearman
170
- value: 83.75492046087288
171
- - type: euclidean_pearson
172
- value: 84.11766204805357
173
- - type: euclidean_spearman
174
- value: 84.01456493126516
175
- - type: manhattan_pearson
176
- value: 84.2132950502772
177
- - type: manhattan_spearman
178
- value: 83.89227298813377
179
- - task:
180
- type: Classification
181
- dataset:
182
- type: mteb/banking77
183
- name: MTEB Banking77Classification
184
- config: default
185
- split: test
186
- revision: 0fd18e25b25c072e09e0d92ab615fda904d66300
187
- metrics:
188
- - type: accuracy
189
- value: 85.74025974025975
190
- - type: f1
191
- value: 85.71493566466381
192
- - task:
193
- type: Clustering
194
- dataset:
195
- type: mteb/biorxiv-clustering-p2p
196
- name: MTEB BiorxivClusteringP2P
197
- config: default
198
- split: test
199
- revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40
200
- metrics:
201
- - type: v_measure
202
- value: 38.467181385006434
203
- - task:
204
- type: Clustering
205
- dataset:
206
- type: mteb/biorxiv-clustering-s2s
207
- name: MTEB BiorxivClusteringS2S
208
- config: default
209
- split: test
210
- revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908
211
- metrics:
212
- - type: v_measure
213
- value: 34.719496037339056
214
- - task:
215
- type: Retrieval
216
- dataset:
217
- type: BeIR/cqadupstack
218
- name: MTEB CQADupstackAndroidRetrieval
219
- config: default
220
- split: test
221
- revision: None
222
- metrics:
223
- - type: map_at_1
224
- value: 29.587000000000003
225
- - type: map_at_10
226
- value: 41.114
227
- - type: map_at_100
228
- value: 42.532
229
- - type: map_at_1000
230
- value: 42.661
231
- - type: map_at_3
232
- value: 37.483
233
- - type: map_at_5
234
- value: 39.652
235
- - type: mrr_at_1
236
- value: 36.338
237
- - type: mrr_at_10
238
- value: 46.763
239
- - type: mrr_at_100
240
- value: 47.393
241
- - type: mrr_at_1000
242
- value: 47.445
243
- - type: mrr_at_3
244
- value: 43.538
245
- - type: mrr_at_5
246
- value: 45.556000000000004
247
- - type: ndcg_at_1
248
- value: 36.338
249
- - type: ndcg_at_10
250
- value: 47.658
251
- - type: ndcg_at_100
252
- value: 52.824000000000005
253
- - type: ndcg_at_1000
254
- value: 54.913999999999994
255
- - type: ndcg_at_3
256
- value: 41.989
257
- - type: ndcg_at_5
258
- value: 44.944
259
- - type: precision_at_1
260
- value: 36.338
261
- - type: precision_at_10
262
- value: 9.156
263
- - type: precision_at_100
264
- value: 1.4789999999999999
265
- - type: precision_at_1000
266
- value: 0.196
267
- - type: precision_at_3
268
- value: 20.076
269
- - type: precision_at_5
270
- value: 14.85
271
- - type: recall_at_1
272
- value: 29.587000000000003
273
- - type: recall_at_10
274
- value: 60.746
275
- - type: recall_at_100
276
- value: 82.157
277
- - type: recall_at_1000
278
- value: 95.645
279
- - type: recall_at_3
280
- value: 44.821
281
- - type: recall_at_5
282
- value: 52.819
283
- - task:
284
- type: Retrieval
285
- dataset:
286
- type: BeIR/cqadupstack
287
- name: MTEB CQADupstackEnglishRetrieval
288
- config: default
289
- split: test
290
- revision: None
291
- metrics:
292
- - type: map_at_1
293
- value: 30.239
294
- - type: map_at_10
295
- value: 39.989000000000004
296
- - type: map_at_100
297
- value: 41.196
298
- - type: map_at_1000
299
- value: 41.325
300
- - type: map_at_3
301
- value: 37.261
302
- - type: map_at_5
303
- value: 38.833
304
- - type: mrr_at_1
305
- value: 37.516
306
- - type: mrr_at_10
307
- value: 46.177
308
- - type: mrr_at_100
309
- value: 46.806
310
- - type: mrr_at_1000
311
- value: 46.849000000000004
312
- - type: mrr_at_3
313
- value: 44.002
314
- - type: mrr_at_5
315
- value: 45.34
316
- - type: ndcg_at_1
317
- value: 37.516
318
- - type: ndcg_at_10
319
- value: 45.586
320
- - type: ndcg_at_100
321
- value: 49.897000000000006
322
- - type: ndcg_at_1000
323
- value: 51.955
324
- - type: ndcg_at_3
325
- value: 41.684
326
- - type: ndcg_at_5
327
- value: 43.617
328
- - type: precision_at_1
329
- value: 37.516
330
- - type: precision_at_10
331
- value: 8.522
332
- - type: precision_at_100
333
- value: 1.374
334
- - type: precision_at_1000
335
- value: 0.184
336
- - type: precision_at_3
337
- value: 20.105999999999998
338
- - type: precision_at_5
339
- value: 14.152999999999999
340
- - type: recall_at_1
341
- value: 30.239
342
- - type: recall_at_10
343
- value: 55.03
344
- - type: recall_at_100
345
- value: 73.375
346
- - type: recall_at_1000
347
- value: 86.29599999999999
348
- - type: recall_at_3
349
- value: 43.269000000000005
350
- - type: recall_at_5
351
- value: 48.878
352
- - task:
353
- type: Retrieval
354
- dataset:
355
- type: BeIR/cqadupstack
356
- name: MTEB CQADupstackGamingRetrieval
357
- config: default
358
- split: test
359
- revision: None
360
- metrics:
361
- - type: map_at_1
362
- value: 38.338
363
- - type: map_at_10
364
- value: 50.468999999999994
365
- - type: map_at_100
366
- value: 51.553000000000004
367
- - type: map_at_1000
368
- value: 51.608
369
- - type: map_at_3
370
- value: 47.107
371
- - type: map_at_5
372
- value: 49.101
373
- - type: mrr_at_1
374
- value: 44.201
375
- - type: mrr_at_10
376
- value: 54.057
377
- - type: mrr_at_100
378
- value: 54.764
379
- - type: mrr_at_1000
380
- value: 54.791000000000004
381
- - type: mrr_at_3
382
- value: 51.56699999999999
383
- - type: mrr_at_5
384
- value: 53.05
385
- - type: ndcg_at_1
386
- value: 44.201
387
- - type: ndcg_at_10
388
- value: 56.379000000000005
389
- - type: ndcg_at_100
390
- value: 60.645
391
- - type: ndcg_at_1000
392
- value: 61.73499999999999
393
- - type: ndcg_at_3
394
- value: 50.726000000000006
395
- - type: ndcg_at_5
396
- value: 53.58500000000001
397
- - type: precision_at_1
398
- value: 44.201
399
- - type: precision_at_10
400
- value: 9.141
401
- - type: precision_at_100
402
- value: 1.216
403
- - type: precision_at_1000
404
- value: 0.135
405
- - type: precision_at_3
406
- value: 22.654
407
- - type: precision_at_5
408
- value: 15.723999999999998
409
- - type: recall_at_1
410
- value: 38.338
411
- - type: recall_at_10
412
- value: 70.30499999999999
413
- - type: recall_at_100
414
- value: 88.77199999999999
415
- - type: recall_at_1000
416
- value: 96.49799999999999
417
- - type: recall_at_3
418
- value: 55.218
419
- - type: recall_at_5
420
- value: 62.104000000000006
421
- - task:
422
- type: Retrieval
423
- dataset:
424
- type: BeIR/cqadupstack
425
- name: MTEB CQADupstackGisRetrieval
426
- config: default
427
- split: test
428
- revision: None
429
- metrics:
430
- - type: map_at_1
431
- value: 25.682
432
- - type: map_at_10
433
- value: 33.498
434
- - type: map_at_100
435
- value: 34.461000000000006
436
- - type: map_at_1000
437
- value: 34.544000000000004
438
- - type: map_at_3
439
- value: 30.503999999999998
440
- - type: map_at_5
441
- value: 32.216
442
- - type: mrr_at_1
443
- value: 27.683999999999997
444
- - type: mrr_at_10
445
- value: 35.467999999999996
446
- - type: mrr_at_100
447
- value: 36.32
448
- - type: mrr_at_1000
449
- value: 36.386
450
- - type: mrr_at_3
451
- value: 32.618
452
- - type: mrr_at_5
453
- value: 34.262
454
- - type: ndcg_at_1
455
- value: 27.683999999999997
456
- - type: ndcg_at_10
457
- value: 38.378
458
- - type: ndcg_at_100
459
- value: 43.288
460
- - type: ndcg_at_1000
461
- value: 45.413
462
- - type: ndcg_at_3
463
- value: 32.586
464
- - type: ndcg_at_5
465
- value: 35.499
466
- - type: precision_at_1
467
- value: 27.683999999999997
468
- - type: precision_at_10
469
- value: 5.864
470
- - type: precision_at_100
471
- value: 0.882
472
- - type: precision_at_1000
473
- value: 0.11
474
- - type: precision_at_3
475
- value: 13.446
476
- - type: precision_at_5
477
- value: 9.718
478
- - type: recall_at_1
479
- value: 25.682
480
- - type: recall_at_10
481
- value: 51.712
482
- - type: recall_at_100
483
- value: 74.446
484
- - type: recall_at_1000
485
- value: 90.472
486
- - type: recall_at_3
487
- value: 36.236000000000004
488
- - type: recall_at_5
489
- value: 43.234
490
- - task:
491
- type: Retrieval
492
- dataset:
493
- type: BeIR/cqadupstack
494
- name: MTEB CQADupstackMathematicaRetrieval
495
- config: default
496
- split: test
497
- revision: None
498
- metrics:
499
- - type: map_at_1
500
- value: 16.073999999999998
501
- - type: map_at_10
502
- value: 24.352999999999998
503
- - type: map_at_100
504
- value: 25.438
505
- - type: map_at_1000
506
- value: 25.545
507
- - type: map_at_3
508
- value: 21.614
509
- - type: map_at_5
510
- value: 23.104
511
- - type: mrr_at_1
512
- value: 19.776
513
- - type: mrr_at_10
514
- value: 28.837000000000003
515
- - type: mrr_at_100
516
- value: 29.755
517
- - type: mrr_at_1000
518
- value: 29.817
519
- - type: mrr_at_3
520
- value: 26.201999999999998
521
- - type: mrr_at_5
522
- value: 27.714
523
- - type: ndcg_at_1
524
- value: 19.776
525
- - type: ndcg_at_10
526
- value: 29.701
527
- - type: ndcg_at_100
528
- value: 35.307
529
- - type: ndcg_at_1000
530
- value: 37.942
531
- - type: ndcg_at_3
532
- value: 24.764
533
- - type: ndcg_at_5
534
- value: 27.025
535
- - type: precision_at_1
536
- value: 19.776
537
- - type: precision_at_10
538
- value: 5.659
539
- - type: precision_at_100
540
- value: 0.971
541
- - type: precision_at_1000
542
- value: 0.133
543
- - type: precision_at_3
544
- value: 12.065
545
- - type: precision_at_5
546
- value: 8.905000000000001
547
- - type: recall_at_1
548
- value: 16.073999999999998
549
- - type: recall_at_10
550
- value: 41.647
551
- - type: recall_at_100
552
- value: 66.884
553
- - type: recall_at_1000
554
- value: 85.91499999999999
555
- - type: recall_at_3
556
- value: 27.916
557
- - type: recall_at_5
558
- value: 33.729
559
- - task:
560
- type: Retrieval
561
- dataset:
562
- type: BeIR/cqadupstack
563
- name: MTEB CQADupstackPhysicsRetrieval
564
- config: default
565
- split: test
566
- revision: None
567
- metrics:
568
- - type: map_at_1
569
- value: 28.444999999999997
570
- - type: map_at_10
571
- value: 38.218999999999994
572
- - type: map_at_100
573
- value: 39.595
574
- - type: map_at_1000
575
- value: 39.709
576
- - type: map_at_3
577
- value: 35.586
578
- - type: map_at_5
579
- value: 36.895
580
- - type: mrr_at_1
581
- value: 34.841
582
- - type: mrr_at_10
583
- value: 44.106
584
- - type: mrr_at_100
585
- value: 44.98
586
- - type: mrr_at_1000
587
- value: 45.03
588
- - type: mrr_at_3
589
- value: 41.979
590
- - type: mrr_at_5
591
- value: 43.047999999999995
592
- - type: ndcg_at_1
593
- value: 34.841
594
- - type: ndcg_at_10
595
- value: 43.922
596
- - type: ndcg_at_100
597
- value: 49.504999999999995
598
- - type: ndcg_at_1000
599
- value: 51.675000000000004
600
- - type: ndcg_at_3
601
- value: 39.858
602
- - type: ndcg_at_5
603
- value: 41.408
604
- - type: precision_at_1
605
- value: 34.841
606
- - type: precision_at_10
607
- value: 7.872999999999999
608
- - type: precision_at_100
609
- value: 1.2449999999999999
610
- - type: precision_at_1000
611
- value: 0.161
612
- - type: precision_at_3
613
- value: 18.993
614
- - type: precision_at_5
615
- value: 13.032
616
- - type: recall_at_1
617
- value: 28.444999999999997
618
- - type: recall_at_10
619
- value: 54.984
620
- - type: recall_at_100
621
- value: 78.342
622
- - type: recall_at_1000
623
- value: 92.77
624
- - type: recall_at_3
625
- value: 42.842999999999996
626
- - type: recall_at_5
627
- value: 47.247
628
- - task:
629
- type: Retrieval
630
- dataset:
631
- type: BeIR/cqadupstack
632
- name: MTEB CQADupstackProgrammersRetrieval
633
- config: default
634
- split: test
635
- revision: None
636
- metrics:
637
- - type: map_at_1
638
- value: 23.072
639
- - type: map_at_10
640
- value: 32.354
641
- - type: map_at_100
642
- value: 33.800000000000004
643
- - type: map_at_1000
644
- value: 33.908
645
- - type: map_at_3
646
- value: 29.232000000000003
647
- - type: map_at_5
648
- value: 31.049
649
- - type: mrr_at_1
650
- value: 29.110000000000003
651
- - type: mrr_at_10
652
- value: 38.03
653
- - type: mrr_at_100
654
- value: 39.032
655
- - type: mrr_at_1000
656
- value: 39.086999999999996
657
- - type: mrr_at_3
658
- value: 35.407
659
- - type: mrr_at_5
660
- value: 36.76
661
- - type: ndcg_at_1
662
- value: 29.110000000000003
663
- - type: ndcg_at_10
664
- value: 38.231
665
- - type: ndcg_at_100
666
- value: 44.425
667
- - type: ndcg_at_1000
668
- value: 46.771
669
- - type: ndcg_at_3
670
- value: 33.095
671
- - type: ndcg_at_5
672
- value: 35.459
673
- - type: precision_at_1
674
- value: 29.110000000000003
675
- - type: precision_at_10
676
- value: 7.215000000000001
677
- - type: precision_at_100
678
- value: 1.2109999999999999
679
- - type: precision_at_1000
680
- value: 0.157
681
- - type: precision_at_3
682
- value: 16.058
683
- - type: precision_at_5
684
- value: 11.644
685
- - type: recall_at_1
686
- value: 23.072
687
- - type: recall_at_10
688
- value: 50.285999999999994
689
- - type: recall_at_100
690
- value: 76.596
691
- - type: recall_at_1000
692
- value: 92.861
693
- - type: recall_at_3
694
- value: 35.702
695
- - type: recall_at_5
696
- value: 42.152
697
- - task:
698
- type: Retrieval
699
- dataset:
700
- type: BeIR/cqadupstack
701
- name: MTEB CQADupstackRetrieval
702
- config: default
703
- split: test
704
- revision: None
705
- metrics:
706
- - type: map_at_1
707
- value: 24.937916666666666
708
- - type: map_at_10
709
- value: 33.755250000000004
710
- - type: map_at_100
711
- value: 34.955999999999996
712
- - type: map_at_1000
713
- value: 35.070499999999996
714
- - type: map_at_3
715
- value: 30.98708333333333
716
- - type: map_at_5
717
- value: 32.51491666666666
718
- - type: mrr_at_1
719
- value: 29.48708333333333
720
- - type: mrr_at_10
721
- value: 37.92183333333334
722
- - type: mrr_at_100
723
- value: 38.76583333333333
724
- - type: mrr_at_1000
725
- value: 38.82466666666667
726
- - type: mrr_at_3
727
- value: 35.45125
728
- - type: mrr_at_5
729
- value: 36.827000000000005
730
- - type: ndcg_at_1
731
- value: 29.48708333333333
732
- - type: ndcg_at_10
733
- value: 39.05225
734
- - type: ndcg_at_100
735
- value: 44.25983333333334
736
- - type: ndcg_at_1000
737
- value: 46.568333333333335
738
- - type: ndcg_at_3
739
- value: 34.271583333333325
740
- - type: ndcg_at_5
741
- value: 36.483916666666666
742
- - type: precision_at_1
743
- value: 29.48708333333333
744
- - type: precision_at_10
745
- value: 6.865749999999999
746
- - type: precision_at_100
747
- value: 1.1195833333333332
748
- - type: precision_at_1000
749
- value: 0.15058333333333335
750
- - type: precision_at_3
751
- value: 15.742083333333333
752
- - type: precision_at_5
753
- value: 11.221916666666667
754
- - type: recall_at_1
755
- value: 24.937916666666666
756
- - type: recall_at_10
757
- value: 50.650416666666665
758
- - type: recall_at_100
759
- value: 73.55383333333334
760
- - type: recall_at_1000
761
- value: 89.61691666666667
762
- - type: recall_at_3
763
- value: 37.27808333333334
764
- - type: recall_at_5
765
- value: 42.99475
766
- - task:
767
- type: Retrieval
768
- dataset:
769
- type: BeIR/cqadupstack
770
- name: MTEB CQADupstackStatsRetrieval
771
- config: default
772
- split: test
773
- revision: None
774
- metrics:
775
- - type: map_at_1
776
- value: 23.947
777
- - type: map_at_10
778
- value: 30.575000000000003
779
- - type: map_at_100
780
- value: 31.465
781
- - type: map_at_1000
782
- value: 31.558000000000003
783
- - type: map_at_3
784
- value: 28.814
785
- - type: map_at_5
786
- value: 29.738999999999997
787
- - type: mrr_at_1
788
- value: 26.994
789
- - type: mrr_at_10
790
- value: 33.415
791
- - type: mrr_at_100
792
- value: 34.18
793
- - type: mrr_at_1000
794
- value: 34.245
795
- - type: mrr_at_3
796
- value: 31.621
797
- - type: mrr_at_5
798
- value: 32.549
799
- - type: ndcg_at_1
800
- value: 26.994
801
- - type: ndcg_at_10
802
- value: 34.482
803
- - type: ndcg_at_100
804
- value: 38.915
805
- - type: ndcg_at_1000
806
- value: 41.355
807
- - type: ndcg_at_3
808
- value: 31.139
809
- - type: ndcg_at_5
810
- value: 32.589
811
- - type: precision_at_1
812
- value: 26.994
813
- - type: precision_at_10
814
- value: 5.322
815
- - type: precision_at_100
816
- value: 0.8160000000000001
817
- - type: precision_at_1000
818
- value: 0.11100000000000002
819
- - type: precision_at_3
820
- value: 13.344000000000001
821
- - type: precision_at_5
822
- value: 8.988
823
- - type: recall_at_1
824
- value: 23.947
825
- - type: recall_at_10
826
- value: 43.647999999999996
827
- - type: recall_at_100
828
- value: 63.851
829
- - type: recall_at_1000
830
- value: 82.0
831
- - type: recall_at_3
832
- value: 34.288000000000004
833
- - type: recall_at_5
834
- value: 38.117000000000004
835
- - task:
836
- type: Retrieval
837
- dataset:
838
- type: BeIR/cqadupstack
839
- name: MTEB CQADupstackTexRetrieval
840
- config: default
841
- split: test
842
- revision: None
843
- metrics:
844
- - type: map_at_1
845
- value: 16.197
846
- - type: map_at_10
847
- value: 22.968
848
- - type: map_at_100
849
- value: 24.095
850
- - type: map_at_1000
851
- value: 24.217
852
- - type: map_at_3
853
- value: 20.771
854
- - type: map_at_5
855
- value: 21.995
856
- - type: mrr_at_1
857
- value: 19.511
858
- - type: mrr_at_10
859
- value: 26.55
860
- - type: mrr_at_100
861
- value: 27.500999999999998
862
- - type: mrr_at_1000
863
- value: 27.578999999999997
864
- - type: mrr_at_3
865
- value: 24.421
866
- - type: mrr_at_5
867
- value: 25.604
868
- - type: ndcg_at_1
869
- value: 19.511
870
- - type: ndcg_at_10
871
- value: 27.386
872
- - type: ndcg_at_100
873
- value: 32.828
874
- - type: ndcg_at_1000
875
- value: 35.739
876
- - type: ndcg_at_3
877
- value: 23.405
878
- - type: ndcg_at_5
879
- value: 25.255
880
- - type: precision_at_1
881
- value: 19.511
882
- - type: precision_at_10
883
- value: 5.017
884
- - type: precision_at_100
885
- value: 0.91
886
- - type: precision_at_1000
887
- value: 0.133
888
- - type: precision_at_3
889
- value: 11.023
890
- - type: precision_at_5
891
- value: 8.025
892
- - type: recall_at_1
893
- value: 16.197
894
- - type: recall_at_10
895
- value: 37.09
896
- - type: recall_at_100
897
- value: 61.778
898
- - type: recall_at_1000
899
- value: 82.56599999999999
900
- - type: recall_at_3
901
- value: 26.034000000000002
902
- - type: recall_at_5
903
- value: 30.762
904
- - task:
905
- type: Retrieval
906
- dataset:
907
- type: BeIR/cqadupstack
908
- name: MTEB CQADupstackUnixRetrieval
909
- config: default
910
- split: test
911
- revision: None
912
- metrics:
913
- - type: map_at_1
914
- value: 25.41
915
- - type: map_at_10
916
- value: 33.655
917
- - type: map_at_100
918
- value: 34.892
919
- - type: map_at_1000
920
- value: 34.995
921
- - type: map_at_3
922
- value: 30.94
923
- - type: map_at_5
924
- value: 32.303
925
- - type: mrr_at_1
926
- value: 29.477999999999998
927
- - type: mrr_at_10
928
- value: 37.443
929
- - type: mrr_at_100
930
- value: 38.383
931
- - type: mrr_at_1000
932
- value: 38.440000000000005
933
- - type: mrr_at_3
934
- value: 34.949999999999996
935
- - type: mrr_at_5
936
- value: 36.228
937
- - type: ndcg_at_1
938
- value: 29.477999999999998
939
- - type: ndcg_at_10
940
- value: 38.769
941
- - type: ndcg_at_100
942
- value: 44.245000000000005
943
- - type: ndcg_at_1000
944
- value: 46.593
945
- - type: ndcg_at_3
946
- value: 33.623
947
- - type: ndcg_at_5
948
- value: 35.766
949
- - type: precision_at_1
950
- value: 29.477999999999998
951
- - type: precision_at_10
952
- value: 6.455
953
- - type: precision_at_100
954
- value: 1.032
955
- - type: precision_at_1000
956
- value: 0.135
957
- - type: precision_at_3
958
- value: 14.893999999999998
959
- - type: precision_at_5
960
- value: 10.485
961
- - type: recall_at_1
962
- value: 25.41
963
- - type: recall_at_10
964
- value: 50.669
965
- - type: recall_at_100
966
- value: 74.084
967
- - type: recall_at_1000
968
- value: 90.435
969
- - type: recall_at_3
970
- value: 36.679
971
- - type: recall_at_5
972
- value: 41.94
973
- - task:
974
- type: Retrieval
975
- dataset:
976
- type: BeIR/cqadupstack
977
- name: MTEB CQADupstackWebmastersRetrieval
978
- config: default
979
- split: test
980
- revision: None
981
- metrics:
982
- - type: map_at_1
983
- value: 23.339
984
- - type: map_at_10
985
- value: 31.852000000000004
986
- - type: map_at_100
987
- value: 33.411
988
- - type: map_at_1000
989
- value: 33.62
990
- - type: map_at_3
991
- value: 28.929
992
- - type: map_at_5
993
- value: 30.542
994
- - type: mrr_at_1
995
- value: 28.063
996
- - type: mrr_at_10
997
- value: 36.301
998
- - type: mrr_at_100
999
- value: 37.288
1000
- - type: mrr_at_1000
1001
- value: 37.349
1002
- - type: mrr_at_3
1003
- value: 33.663
1004
- - type: mrr_at_5
1005
- value: 35.165
1006
- - type: ndcg_at_1
1007
- value: 28.063
1008
- - type: ndcg_at_10
1009
- value: 37.462
1010
- - type: ndcg_at_100
1011
- value: 43.620999999999995
1012
- - type: ndcg_at_1000
1013
- value: 46.211
1014
- - type: ndcg_at_3
1015
- value: 32.68
1016
- - type: ndcg_at_5
1017
- value: 34.981
1018
- - type: precision_at_1
1019
- value: 28.063
1020
- - type: precision_at_10
1021
- value: 7.1739999999999995
1022
- - type: precision_at_100
1023
- value: 1.486
1024
- - type: precision_at_1000
1025
- value: 0.23500000000000001
1026
- - type: precision_at_3
1027
- value: 15.217
1028
- - type: precision_at_5
1029
- value: 11.265
1030
- - type: recall_at_1
1031
- value: 23.339
1032
- - type: recall_at_10
1033
- value: 48.376999999999995
1034
- - type: recall_at_100
1035
- value: 76.053
1036
- - type: recall_at_1000
1037
- value: 92.455
1038
- - type: recall_at_3
1039
- value: 34.735
1040
- - type: recall_at_5
1041
- value: 40.71
1042
- - task:
1043
- type: Retrieval
1044
- dataset:
1045
- type: BeIR/cqadupstack
1046
- name: MTEB CQADupstackWordpressRetrieval
1047
- config: default
1048
- split: test
1049
- revision: None
1050
- metrics:
1051
- - type: map_at_1
1052
- value: 18.925
1053
- - type: map_at_10
1054
- value: 26.017000000000003
1055
- - type: map_at_100
1056
- value: 27.034000000000002
1057
- - type: map_at_1000
1058
- value: 27.156000000000002
1059
- - type: map_at_3
1060
- value: 23.604
1061
- - type: map_at_5
1062
- value: 24.75
1063
- - type: mrr_at_1
1064
- value: 20.333000000000002
1065
- - type: mrr_at_10
1066
- value: 27.915
1067
- - type: mrr_at_100
1068
- value: 28.788000000000004
1069
- - type: mrr_at_1000
1070
- value: 28.877999999999997
1071
- - type: mrr_at_3
1072
- value: 25.446999999999996
1073
- - type: mrr_at_5
1074
- value: 26.648
1075
- - type: ndcg_at_1
1076
- value: 20.333000000000002
1077
- - type: ndcg_at_10
1078
- value: 30.673000000000002
1079
- - type: ndcg_at_100
1080
- value: 35.618
1081
- - type: ndcg_at_1000
1082
- value: 38.517
1083
- - type: ndcg_at_3
1084
- value: 25.71
1085
- - type: ndcg_at_5
1086
- value: 27.679
1087
- - type: precision_at_1
1088
- value: 20.333000000000002
1089
- - type: precision_at_10
1090
- value: 4.9910000000000005
1091
- - type: precision_at_100
1092
- value: 0.8130000000000001
1093
- - type: precision_at_1000
1094
- value: 0.117
1095
- - type: precision_at_3
1096
- value: 11.029
1097
- - type: precision_at_5
1098
- value: 7.8740000000000006
1099
- - type: recall_at_1
1100
- value: 18.925
1101
- - type: recall_at_10
1102
- value: 43.311
1103
- - type: recall_at_100
1104
- value: 66.308
1105
- - type: recall_at_1000
1106
- value: 87.49
1107
- - type: recall_at_3
1108
- value: 29.596
1109
- - type: recall_at_5
1110
- value: 34.245
1111
- - task:
1112
- type: Retrieval
1113
- dataset:
1114
- type: climate-fever
1115
- name: MTEB ClimateFEVER
1116
- config: default
1117
- split: test
1118
- revision: None
1119
- metrics:
1120
- - type: map_at_1
1121
- value: 13.714
1122
- - type: map_at_10
1123
- value: 23.194
1124
- - type: map_at_100
1125
- value: 24.976000000000003
1126
- - type: map_at_1000
1127
- value: 25.166
1128
- - type: map_at_3
1129
- value: 19.709
1130
- - type: map_at_5
1131
- value: 21.523999999999997
1132
- - type: mrr_at_1
1133
- value: 30.619000000000003
1134
- - type: mrr_at_10
1135
- value: 42.563
1136
- - type: mrr_at_100
1137
- value: 43.386
1138
- - type: mrr_at_1000
1139
- value: 43.423
1140
- - type: mrr_at_3
1141
- value: 39.555
1142
- - type: mrr_at_5
1143
- value: 41.268
1144
- - type: ndcg_at_1
1145
- value: 30.619000000000003
1146
- - type: ndcg_at_10
1147
- value: 31.836
1148
- - type: ndcg_at_100
1149
- value: 38.652
1150
- - type: ndcg_at_1000
1151
- value: 42.088
1152
- - type: ndcg_at_3
1153
- value: 26.733
1154
- - type: ndcg_at_5
1155
- value: 28.435
1156
- - type: precision_at_1
1157
- value: 30.619000000000003
1158
- - type: precision_at_10
1159
- value: 9.751999999999999
1160
- - type: precision_at_100
1161
- value: 1.71
1162
- - type: precision_at_1000
1163
- value: 0.23500000000000001
1164
- - type: precision_at_3
1165
- value: 19.935
1166
- - type: precision_at_5
1167
- value: 14.984
1168
- - type: recall_at_1
1169
- value: 13.714
1170
- - type: recall_at_10
1171
- value: 37.26
1172
- - type: recall_at_100
1173
- value: 60.546
1174
- - type: recall_at_1000
1175
- value: 79.899
1176
- - type: recall_at_3
1177
- value: 24.325
1178
- - type: recall_at_5
1179
- value: 29.725
1180
- - task:
1181
- type: Retrieval
1182
- dataset:
1183
- type: dbpedia-entity
1184
- name: MTEB DBPedia
1185
- config: default
1186
- split: test
1187
- revision: None
1188
- metrics:
1189
- - type: map_at_1
1190
- value: 8.462
1191
- - type: map_at_10
1192
- value: 18.637
1193
- - type: map_at_100
1194
- value: 26.131999999999998
1195
- - type: map_at_1000
1196
- value: 27.607
1197
- - type: map_at_3
1198
- value: 13.333
1199
- - type: map_at_5
1200
- value: 15.654000000000002
1201
- - type: mrr_at_1
1202
- value: 66.25
1203
- - type: mrr_at_10
1204
- value: 74.32600000000001
1205
- - type: mrr_at_100
1206
- value: 74.60900000000001
1207
- - type: mrr_at_1000
1208
- value: 74.62
1209
- - type: mrr_at_3
1210
- value: 72.667
1211
- - type: mrr_at_5
1212
- value: 73.817
1213
- - type: ndcg_at_1
1214
- value: 53.87499999999999
1215
- - type: ndcg_at_10
1216
- value: 40.028999999999996
1217
- - type: ndcg_at_100
1218
- value: 44.199
1219
- - type: ndcg_at_1000
1220
- value: 51.629999999999995
1221
- - type: ndcg_at_3
1222
- value: 44.113
1223
- - type: ndcg_at_5
1224
- value: 41.731
1225
- - type: precision_at_1
1226
- value: 66.25
1227
- - type: precision_at_10
1228
- value: 31.900000000000002
1229
- - type: precision_at_100
1230
- value: 10.043000000000001
1231
- - type: precision_at_1000
1232
- value: 1.926
1233
- - type: precision_at_3
1234
- value: 47.417
1235
- - type: precision_at_5
1236
- value: 40.65
1237
- - type: recall_at_1
1238
- value: 8.462
1239
- - type: recall_at_10
1240
- value: 24.293
1241
- - type: recall_at_100
1242
- value: 50.146
1243
- - type: recall_at_1000
1244
- value: 74.034
1245
- - type: recall_at_3
1246
- value: 14.967
1247
- - type: recall_at_5
1248
- value: 18.682000000000002
1249
- - task:
1250
- type: Classification
1251
- dataset:
1252
- type: mteb/emotion
1253
- name: MTEB EmotionClassification
1254
- config: default
1255
- split: test
1256
- revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37
1257
- metrics:
1258
- - type: accuracy
1259
- value: 47.84499999999999
1260
- - type: f1
1261
- value: 42.48106691979349
1262
- - task:
1263
- type: Retrieval
1264
- dataset:
1265
- type: fever
1266
- name: MTEB FEVER
1267
- config: default
1268
- split: test
1269
- revision: None
1270
- metrics:
1271
- - type: map_at_1
1272
- value: 74.034
1273
- - type: map_at_10
1274
- value: 82.76
1275
- - type: map_at_100
1276
- value: 82.968
1277
- - type: map_at_1000
1278
- value: 82.98299999999999
1279
- - type: map_at_3
1280
- value: 81.768
1281
- - type: map_at_5
1282
- value: 82.418
1283
- - type: mrr_at_1
1284
- value: 80.048
1285
- - type: mrr_at_10
1286
- value: 87.64999999999999
1287
- - type: mrr_at_100
1288
- value: 87.712
1289
- - type: mrr_at_1000
1290
- value: 87.713
1291
- - type: mrr_at_3
1292
- value: 87.01100000000001
1293
- - type: mrr_at_5
1294
- value: 87.466
1295
- - type: ndcg_at_1
1296
- value: 80.048
1297
- - type: ndcg_at_10
1298
- value: 86.643
1299
- - type: ndcg_at_100
1300
- value: 87.361
1301
- - type: ndcg_at_1000
1302
- value: 87.606
1303
- - type: ndcg_at_3
1304
- value: 85.137
1305
- - type: ndcg_at_5
1306
- value: 86.016
1307
- - type: precision_at_1
1308
- value: 80.048
1309
- - type: precision_at_10
1310
- value: 10.372
1311
- - type: precision_at_100
1312
- value: 1.093
1313
- - type: precision_at_1000
1314
- value: 0.11299999999999999
1315
- - type: precision_at_3
1316
- value: 32.638
1317
- - type: precision_at_5
1318
- value: 20.177
1319
- - type: recall_at_1
1320
- value: 74.034
1321
- - type: recall_at_10
1322
- value: 93.769
1323
- - type: recall_at_100
1324
- value: 96.569
1325
- - type: recall_at_1000
1326
- value: 98.039
1327
- - type: recall_at_3
1328
- value: 89.581
1329
- - type: recall_at_5
1330
- value: 91.906
1331
- - task:
1332
- type: Retrieval
1333
- dataset:
1334
- type: fiqa
1335
- name: MTEB FiQA2018
1336
- config: default
1337
- split: test
1338
- revision: None
1339
- metrics:
1340
- - type: map_at_1
1341
- value: 20.5
1342
- - type: map_at_10
1343
- value: 32.857
1344
- - type: map_at_100
1345
- value: 34.589
1346
- - type: map_at_1000
1347
- value: 34.778
1348
- - type: map_at_3
1349
- value: 29.160999999999998
1350
- - type: map_at_5
1351
- value: 31.033
1352
- - type: mrr_at_1
1353
- value: 40.123
1354
- - type: mrr_at_10
1355
- value: 48.776
1356
- - type: mrr_at_100
1357
- value: 49.495
1358
- - type: mrr_at_1000
1359
- value: 49.539
1360
- - type: mrr_at_3
1361
- value: 46.605000000000004
1362
- - type: mrr_at_5
1363
- value: 47.654
1364
- - type: ndcg_at_1
1365
- value: 40.123
1366
- - type: ndcg_at_10
1367
- value: 40.343
1368
- - type: ndcg_at_100
1369
- value: 46.56
1370
- - type: ndcg_at_1000
1371
- value: 49.777
1372
- - type: ndcg_at_3
1373
- value: 37.322
1374
- - type: ndcg_at_5
1375
- value: 37.791000000000004
1376
- - type: precision_at_1
1377
- value: 40.123
1378
- - type: precision_at_10
1379
- value: 11.08
1380
- - type: precision_at_100
1381
- value: 1.752
1382
- - type: precision_at_1000
1383
- value: 0.232
1384
- - type: precision_at_3
1385
- value: 24.897
1386
- - type: precision_at_5
1387
- value: 17.809
1388
- - type: recall_at_1
1389
- value: 20.5
1390
- - type: recall_at_10
1391
- value: 46.388
1392
- - type: recall_at_100
1393
- value: 69.552
1394
- - type: recall_at_1000
1395
- value: 89.011
1396
- - type: recall_at_3
1397
- value: 33.617999999999995
1398
- - type: recall_at_5
1399
- value: 38.211
1400
- - task:
1401
- type: Retrieval
1402
- dataset:
1403
- type: hotpotqa
1404
- name: MTEB HotpotQA
1405
- config: default
1406
- split: test
1407
- revision: None
1408
- metrics:
1409
- - type: map_at_1
1410
- value: 39.135999999999996
1411
- - type: map_at_10
1412
- value: 61.673
1413
- - type: map_at_100
1414
- value: 62.562
1415
- - type: map_at_1000
1416
- value: 62.62
1417
- - type: map_at_3
1418
- value: 58.467999999999996
1419
- - type: map_at_5
1420
- value: 60.463
1421
- - type: mrr_at_1
1422
- value: 78.271
1423
- - type: mrr_at_10
1424
- value: 84.119
1425
- - type: mrr_at_100
1426
- value: 84.29299999999999
1427
- - type: mrr_at_1000
1428
- value: 84.299
1429
- - type: mrr_at_3
1430
- value: 83.18900000000001
1431
- - type: mrr_at_5
1432
- value: 83.786
1433
- - type: ndcg_at_1
1434
- value: 78.271
1435
- - type: ndcg_at_10
1436
- value: 69.935
1437
- - type: ndcg_at_100
1438
- value: 73.01299999999999
1439
- - type: ndcg_at_1000
1440
- value: 74.126
1441
- - type: ndcg_at_3
1442
- value: 65.388
1443
- - type: ndcg_at_5
1444
- value: 67.906
1445
- - type: precision_at_1
1446
- value: 78.271
1447
- - type: precision_at_10
1448
- value: 14.562
1449
- - type: precision_at_100
1450
- value: 1.6969999999999998
1451
- - type: precision_at_1000
1452
- value: 0.184
1453
- - type: precision_at_3
1454
- value: 41.841
1455
- - type: precision_at_5
1456
- value: 27.087
1457
- - type: recall_at_1
1458
- value: 39.135999999999996
1459
- - type: recall_at_10
1460
- value: 72.809
1461
- - type: recall_at_100
1462
- value: 84.86200000000001
1463
- - type: recall_at_1000
1464
- value: 92.208
1465
- - type: recall_at_3
1466
- value: 62.76199999999999
1467
- - type: recall_at_5
1468
- value: 67.718
1469
- - task:
1470
- type: Classification
1471
- dataset:
1472
- type: mteb/imdb
1473
- name: MTEB ImdbClassification
1474
- config: default
1475
- split: test
1476
- revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7
1477
- metrics:
1478
- - type: accuracy
1479
- value: 90.60600000000001
1480
- - type: ap
1481
- value: 86.6579587804335
1482
- - type: f1
1483
- value: 90.5938853929307
1484
- - task:
1485
- type: Retrieval
1486
- dataset:
1487
- type: msmarco
1488
- name: MTEB MSMARCO
1489
- config: default
1490
- split: dev
1491
- revision: None
1492
- metrics:
1493
- - type: map_at_1
1494
- value: 21.852
1495
- - type: map_at_10
1496
- value: 33.982
1497
- - type: map_at_100
1498
- value: 35.116
1499
- - type: map_at_1000
1500
- value: 35.167
1501
- - type: map_at_3
1502
- value: 30.134
1503
- - type: map_at_5
1504
- value: 32.340999999999994
1505
- - type: mrr_at_1
1506
- value: 22.479
1507
- - type: mrr_at_10
1508
- value: 34.594
1509
- - type: mrr_at_100
1510
- value: 35.672
1511
- - type: mrr_at_1000
1512
- value: 35.716
1513
- - type: mrr_at_3
1514
- value: 30.84
1515
- - type: mrr_at_5
1516
- value: 32.998
1517
- - type: ndcg_at_1
1518
- value: 22.493
1519
- - type: ndcg_at_10
1520
- value: 40.833000000000006
1521
- - type: ndcg_at_100
1522
- value: 46.357
1523
- - type: ndcg_at_1000
1524
- value: 47.637
1525
- - type: ndcg_at_3
1526
- value: 32.995999999999995
1527
- - type: ndcg_at_5
1528
- value: 36.919000000000004
1529
- - type: precision_at_1
1530
- value: 22.493
1531
- - type: precision_at_10
1532
- value: 6.465999999999999
1533
- - type: precision_at_100
1534
- value: 0.9249999999999999
1535
- - type: precision_at_1000
1536
- value: 0.104
1537
- - type: precision_at_3
1538
- value: 14.030999999999999
1539
- - type: precision_at_5
1540
- value: 10.413
1541
- - type: recall_at_1
1542
- value: 21.852
1543
- - type: recall_at_10
1544
- value: 61.934999999999995
1545
- - type: recall_at_100
1546
- value: 87.611
1547
- - type: recall_at_1000
1548
- value: 97.441
1549
- - type: recall_at_3
1550
- value: 40.583999999999996
1551
- - type: recall_at_5
1552
- value: 49.992999999999995
1553
- - task:
1554
- type: Classification
1555
- dataset:
1556
- type: mteb/mtop_domain
1557
- name: MTEB MTOPDomainClassification (en)
1558
- config: en
1559
- split: test
1560
- revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
1561
- metrics:
1562
- - type: accuracy
1563
- value: 93.36069311445507
1564
- - type: f1
1565
- value: 93.16456330371453
1566
- - task:
1567
- type: Classification
1568
- dataset:
1569
- type: mteb/mtop_intent
1570
- name: MTEB MTOPIntentClassification (en)
1571
- config: en
1572
- split: test
1573
- revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
1574
- metrics:
1575
- - type: accuracy
1576
- value: 74.74692202462381
1577
- - type: f1
1578
- value: 58.17903579421599
1579
- - task:
1580
- type: Classification
1581
- dataset:
1582
- type: mteb/amazon_massive_intent
1583
- name: MTEB MassiveIntentClassification (en)
1584
- config: en
1585
- split: test
1586
- revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1587
- metrics:
1588
- - type: accuracy
1589
- value: 74.80833893745796
1590
- - type: f1
1591
- value: 72.70786592684664
1592
- - task:
1593
- type: Classification
1594
- dataset:
1595
- type: mteb/amazon_massive_scenario
1596
- name: MTEB MassiveScenarioClassification (en)
1597
- config: en
1598
- split: test
1599
- revision: 7d571f92784cd94a019292a1f45445077d0ef634
1600
- metrics:
1601
- - type: accuracy
1602
- value: 78.69872225958305
1603
- - type: f1
1604
- value: 78.61626934504731
1605
- - task:
1606
- type: Clustering
1607
- dataset:
1608
- type: mteb/medrxiv-clustering-p2p
1609
- name: MTEB MedrxivClusteringP2P
1610
- config: default
1611
- split: test
1612
- revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73
1613
- metrics:
1614
- - type: v_measure
1615
- value: 33.058658628717694
1616
- - task:
1617
- type: Clustering
1618
- dataset:
1619
- type: mteb/medrxiv-clustering-s2s
1620
- name: MTEB MedrxivClusteringS2S
1621
- config: default
1622
- split: test
1623
- revision: 35191c8c0dca72d8ff3efcd72aa802307d469663
1624
- metrics:
1625
- - type: v_measure
1626
- value: 30.85561739360599
1627
- - task:
1628
- type: Reranking
1629
- dataset:
1630
- type: mteb/mind_small
1631
- name: MTEB MindSmallReranking
1632
- config: default
1633
- split: test
1634
- revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69
1635
- metrics:
1636
- - type: map
1637
- value: 31.290259910144385
1638
- - type: mrr
1639
- value: 32.44223046102856
1640
- - task:
1641
- type: Retrieval
1642
- dataset:
1643
- type: nfcorpus
1644
- name: MTEB NFCorpus
1645
- config: default
1646
- split: test
1647
- revision: None
1648
- metrics:
1649
- - type: map_at_1
1650
- value: 5.288
1651
- - type: map_at_10
1652
- value: 12.267999999999999
1653
- - type: map_at_100
1654
- value: 15.557000000000002
1655
- - type: map_at_1000
1656
- value: 16.98
1657
- - type: map_at_3
1658
- value: 8.866
1659
- - type: map_at_5
1660
- value: 10.418
1661
- - type: mrr_at_1
1662
- value: 43.653
1663
- - type: mrr_at_10
1664
- value: 52.681
1665
- - type: mrr_at_100
1666
- value: 53.315999999999995
1667
- - type: mrr_at_1000
1668
- value: 53.357
1669
- - type: mrr_at_3
1670
- value: 51.393
1671
- - type: mrr_at_5
1672
- value: 51.903999999999996
1673
- - type: ndcg_at_1
1674
- value: 42.415000000000006
1675
- - type: ndcg_at_10
1676
- value: 34.305
1677
- - type: ndcg_at_100
1678
- value: 30.825999999999997
1679
- - type: ndcg_at_1000
1680
- value: 39.393
1681
- - type: ndcg_at_3
1682
- value: 39.931
1683
- - type: ndcg_at_5
1684
- value: 37.519999999999996
1685
- - type: precision_at_1
1686
- value: 43.653
1687
- - type: precision_at_10
1688
- value: 25.728
1689
- - type: precision_at_100
1690
- value: 7.932
1691
- - type: precision_at_1000
1692
- value: 2.07
1693
- - type: precision_at_3
1694
- value: 38.184000000000005
1695
- - type: precision_at_5
1696
- value: 32.879000000000005
1697
- - type: recall_at_1
1698
- value: 5.288
1699
- - type: recall_at_10
1700
- value: 16.195
1701
- - type: recall_at_100
1702
- value: 31.135
1703
- - type: recall_at_1000
1704
- value: 61.531000000000006
1705
- - type: recall_at_3
1706
- value: 10.313
1707
- - type: recall_at_5
1708
- value: 12.754999999999999
1709
- - task:
1710
- type: Retrieval
1711
- dataset:
1712
- type: nq
1713
- name: MTEB NQ
1714
- config: default
1715
- split: test
1716
- revision: None
1717
- metrics:
1718
- - type: map_at_1
1719
- value: 28.216
1720
- - type: map_at_10
1721
- value: 42.588
1722
- - type: map_at_100
1723
- value: 43.702999999999996
1724
- - type: map_at_1000
1725
- value: 43.739
1726
- - type: map_at_3
1727
- value: 38.177
1728
- - type: map_at_5
1729
- value: 40.754000000000005
1730
- - type: mrr_at_1
1731
- value: 31.866
1732
- - type: mrr_at_10
1733
- value: 45.189
1734
- - type: mrr_at_100
1735
- value: 46.056000000000004
1736
- - type: mrr_at_1000
1737
- value: 46.081
1738
- - type: mrr_at_3
1739
- value: 41.526999999999994
1740
- - type: mrr_at_5
1741
- value: 43.704
1742
- - type: ndcg_at_1
1743
- value: 31.837
1744
- - type: ndcg_at_10
1745
- value: 50.178
1746
- - type: ndcg_at_100
1747
- value: 54.98800000000001
1748
- - type: ndcg_at_1000
1749
- value: 55.812
1750
- - type: ndcg_at_3
1751
- value: 41.853
1752
- - type: ndcg_at_5
1753
- value: 46.153
1754
- - type: precision_at_1
1755
- value: 31.837
1756
- - type: precision_at_10
1757
- value: 8.43
1758
- - type: precision_at_100
1759
- value: 1.1119999999999999
1760
- - type: precision_at_1000
1761
- value: 0.11900000000000001
1762
- - type: precision_at_3
1763
- value: 19.023
1764
- - type: precision_at_5
1765
- value: 13.911000000000001
1766
- - type: recall_at_1
1767
- value: 28.216
1768
- - type: recall_at_10
1769
- value: 70.8
1770
- - type: recall_at_100
1771
- value: 91.857
1772
- - type: recall_at_1000
1773
- value: 97.941
1774
- - type: recall_at_3
1775
- value: 49.196
1776
- - type: recall_at_5
1777
- value: 59.072
1778
- - task:
1779
- type: Retrieval
1780
- dataset:
1781
- type: quora
1782
- name: MTEB QuoraRetrieval
1783
- config: default
1784
- split: test
1785
- revision: None
1786
- metrics:
1787
- - type: map_at_1
1788
- value: 71.22800000000001
1789
- - type: map_at_10
1790
- value: 85.115
1791
- - type: map_at_100
1792
- value: 85.72
1793
- - type: map_at_1000
1794
- value: 85.737
1795
- - type: map_at_3
1796
- value: 82.149
1797
- - type: map_at_5
1798
- value: 84.029
1799
- - type: mrr_at_1
1800
- value: 81.96
1801
- - type: mrr_at_10
1802
- value: 88.00200000000001
1803
- - type: mrr_at_100
1804
- value: 88.088
1805
- - type: mrr_at_1000
1806
- value: 88.089
1807
- - type: mrr_at_3
1808
- value: 87.055
1809
- - type: mrr_at_5
1810
- value: 87.715
1811
- - type: ndcg_at_1
1812
- value: 82.01
1813
- - type: ndcg_at_10
1814
- value: 88.78
1815
- - type: ndcg_at_100
1816
- value: 89.91
1817
- - type: ndcg_at_1000
1818
- value: 90.013
1819
- - type: ndcg_at_3
1820
- value: 85.957
1821
- - type: ndcg_at_5
1822
- value: 87.56
1823
- - type: precision_at_1
1824
- value: 82.01
1825
- - type: precision_at_10
1826
- value: 13.462
1827
- - type: precision_at_100
1828
- value: 1.528
1829
- - type: precision_at_1000
1830
- value: 0.157
1831
- - type: precision_at_3
1832
- value: 37.553
1833
- - type: precision_at_5
1834
- value: 24.732000000000003
1835
- - type: recall_at_1
1836
- value: 71.22800000000001
1837
- - type: recall_at_10
1838
- value: 95.69
1839
- - type: recall_at_100
1840
- value: 99.531
1841
- - type: recall_at_1000
1842
- value: 99.98
1843
- - type: recall_at_3
1844
- value: 87.632
1845
- - type: recall_at_5
1846
- value: 92.117
1847
- - task:
1848
- type: Clustering
1849
- dataset:
1850
- type: mteb/reddit-clustering
1851
- name: MTEB RedditClustering
1852
- config: default
1853
- split: test
1854
- revision: 24640382cdbf8abc73003fb0fa6d111a705499eb
1855
- metrics:
1856
- - type: v_measure
1857
- value: 52.31768034366916
1858
- - task:
1859
- type: Clustering
1860
- dataset:
1861
- type: mteb/reddit-clustering-p2p
1862
- name: MTEB RedditClusteringP2P
1863
- config: default
1864
- split: test
1865
- revision: 282350215ef01743dc01b456c7f5241fa8937f16
1866
- metrics:
1867
- - type: v_measure
1868
- value: 60.640266772723606
1869
- - task:
1870
- type: Retrieval
1871
- dataset:
1872
- type: scidocs
1873
- name: MTEB SCIDOCS
1874
- config: default
1875
- split: test
1876
- revision: None
1877
- metrics:
1878
- - type: map_at_1
1879
- value: 4.7780000000000005
1880
- - type: map_at_10
1881
- value: 12.299
1882
- - type: map_at_100
1883
- value: 14.363000000000001
1884
- - type: map_at_1000
1885
- value: 14.71
1886
- - type: map_at_3
1887
- value: 8.738999999999999
1888
- - type: map_at_5
1889
- value: 10.397
1890
- - type: mrr_at_1
1891
- value: 23.599999999999998
1892
- - type: mrr_at_10
1893
- value: 34.845
1894
- - type: mrr_at_100
1895
- value: 35.916
1896
- - type: mrr_at_1000
1897
- value: 35.973
1898
- - type: mrr_at_3
1899
- value: 31.7
1900
- - type: mrr_at_5
1901
- value: 33.535
1902
- - type: ndcg_at_1
1903
- value: 23.599999999999998
1904
- - type: ndcg_at_10
1905
- value: 20.522000000000002
1906
- - type: ndcg_at_100
1907
- value: 28.737000000000002
1908
- - type: ndcg_at_1000
1909
- value: 34.596
1910
- - type: ndcg_at_3
1911
- value: 19.542
1912
- - type: ndcg_at_5
1913
- value: 16.958000000000002
1914
- - type: precision_at_1
1915
- value: 23.599999999999998
1916
- - type: precision_at_10
1917
- value: 10.67
1918
- - type: precision_at_100
1919
- value: 2.259
1920
- - type: precision_at_1000
1921
- value: 0.367
1922
- - type: precision_at_3
1923
- value: 18.333
1924
- - type: precision_at_5
1925
- value: 14.879999999999999
1926
- - type: recall_at_1
1927
- value: 4.7780000000000005
1928
- - type: recall_at_10
1929
- value: 21.617
1930
- - type: recall_at_100
1931
- value: 45.905
1932
- - type: recall_at_1000
1933
- value: 74.42
1934
- - type: recall_at_3
1935
- value: 11.148
1936
- - type: recall_at_5
1937
- value: 15.082999999999998
1938
- - task:
1939
- type: STS
1940
- dataset:
1941
- type: mteb/sickr-sts
1942
- name: MTEB SICK-R
1943
- config: default
1944
- split: test
1945
- revision: a6ea5a8cab320b040a23452cc28066d9beae2cee
1946
- metrics:
1947
- - type: cos_sim_pearson
1948
- value: 83.22372750297885
1949
- - type: cos_sim_spearman
1950
- value: 79.40972617119405
1951
- - type: euclidean_pearson
1952
- value: 80.6101072020434
1953
- - type: euclidean_spearman
1954
- value: 79.53844217225202
1955
- - type: manhattan_pearson
1956
- value: 80.57265975286111
1957
- - type: manhattan_spearman
1958
- value: 79.46335611792958
1959
- - task:
1960
- type: STS
1961
- dataset:
1962
- type: mteb/sts12-sts
1963
- name: MTEB STS12
1964
- config: default
1965
- split: test
1966
- revision: a0d554a64d88156834ff5ae9920b964011b16384
1967
- metrics:
1968
- - type: cos_sim_pearson
1969
- value: 85.43713315520749
1970
- - type: cos_sim_spearman
1971
- value: 77.44128693329532
1972
- - type: euclidean_pearson
1973
- value: 81.63869928101123
1974
- - type: euclidean_spearman
1975
- value: 77.29512977961515
1976
- - type: manhattan_pearson
1977
- value: 81.63704185566183
1978
- - type: manhattan_spearman
1979
- value: 77.29909412738657
1980
- - task:
1981
- type: STS
1982
- dataset:
1983
- type: mteb/sts13-sts
1984
- name: MTEB STS13
1985
- config: default
1986
- split: test
1987
- revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca
1988
- metrics:
1989
- - type: cos_sim_pearson
1990
- value: 81.59451537860527
1991
- - type: cos_sim_spearman
1992
- value: 82.97994638856723
1993
- - type: euclidean_pearson
1994
- value: 82.89478688288412
1995
- - type: euclidean_spearman
1996
- value: 83.58740751053104
1997
- - type: manhattan_pearson
1998
- value: 82.69140840941608
1999
- - type: manhattan_spearman
2000
- value: 83.33665956040555
2001
- - task:
2002
- type: STS
2003
- dataset:
2004
- type: mteb/sts14-sts
2005
- name: MTEB STS14
2006
- config: default
2007
- split: test
2008
- revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375
2009
- metrics:
2010
- - type: cos_sim_pearson
2011
- value: 82.00756527711764
2012
- - type: cos_sim_spearman
2013
- value: 81.83560996841379
2014
- - type: euclidean_pearson
2015
- value: 82.07684151976518
2016
- - type: euclidean_spearman
2017
- value: 82.00913052060511
2018
- - type: manhattan_pearson
2019
- value: 82.05690778488794
2020
- - type: manhattan_spearman
2021
- value: 82.02260252019525
2022
- - task:
2023
- type: STS
2024
- dataset:
2025
- type: mteb/sts15-sts
2026
- name: MTEB STS15
2027
- config: default
2028
- split: test
2029
- revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3
2030
- metrics:
2031
- - type: cos_sim_pearson
2032
- value: 86.13710262895447
2033
- - type: cos_sim_spearman
2034
- value: 87.26412811156248
2035
- - type: euclidean_pearson
2036
- value: 86.94151453230228
2037
- - type: euclidean_spearman
2038
- value: 87.5363796699571
2039
- - type: manhattan_pearson
2040
- value: 86.86989424083748
2041
- - type: manhattan_spearman
2042
- value: 87.47315940781353
2043
- - task:
2044
- type: STS
2045
- dataset:
2046
- type: mteb/sts16-sts
2047
- name: MTEB STS16
2048
- config: default
2049
- split: test
2050
- revision: 4d8694f8f0e0100860b497b999b3dbed754a0513
2051
- metrics:
2052
- - type: cos_sim_pearson
2053
- value: 83.0230597603627
2054
- - type: cos_sim_spearman
2055
- value: 84.93344499318864
2056
- - type: euclidean_pearson
2057
- value: 84.23754743431141
2058
- - type: euclidean_spearman
2059
- value: 85.09707376597099
2060
- - type: manhattan_pearson
2061
- value: 84.04325160987763
2062
- - type: manhattan_spearman
2063
- value: 84.89353071339909
2064
- - task:
2065
- type: STS
2066
- dataset:
2067
- type: mteb/sts17-crosslingual-sts
2068
- name: MTEB STS17 (en-en)
2069
- config: en-en
2070
- split: test
2071
- revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d
2072
- metrics:
2073
- - type: cos_sim_pearson
2074
- value: 86.75620824563921
2075
- - type: cos_sim_spearman
2076
- value: 87.15065513706398
2077
- - type: euclidean_pearson
2078
- value: 88.26281533633521
2079
- - type: euclidean_spearman
2080
- value: 87.51963738643983
2081
- - type: manhattan_pearson
2082
- value: 88.25599267618065
2083
- - type: manhattan_spearman
2084
- value: 87.58048736047483
2085
- - task:
2086
- type: STS
2087
- dataset:
2088
- type: mteb/sts22-crosslingual-sts
2089
- name: MTEB STS22 (en)
2090
- config: en
2091
- split: test
2092
- revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80
2093
- metrics:
2094
- - type: cos_sim_pearson
2095
- value: 64.74645319195137
2096
- - type: cos_sim_spearman
2097
- value: 65.29996325037214
2098
- - type: euclidean_pearson
2099
- value: 67.04297794086443
2100
- - type: euclidean_spearman
2101
- value: 65.43841726694343
2102
- - type: manhattan_pearson
2103
- value: 67.39459955690904
2104
- - type: manhattan_spearman
2105
- value: 65.92864704413651
2106
- - task:
2107
- type: STS
2108
- dataset:
2109
- type: mteb/stsbenchmark-sts
2110
- name: MTEB STSBenchmark
2111
- config: default
2112
- split: test
2113
- revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831
2114
- metrics:
2115
- - type: cos_sim_pearson
2116
- value: 84.31291020270801
2117
- - type: cos_sim_spearman
2118
- value: 85.86473738688068
2119
- - type: euclidean_pearson
2120
- value: 85.65537275064152
2121
- - type: euclidean_spearman
2122
- value: 86.13087454209642
2123
- - type: manhattan_pearson
2124
- value: 85.43946955047609
2125
- - type: manhattan_spearman
2126
- value: 85.91568175344916
2127
- - task:
2128
- type: Reranking
2129
- dataset:
2130
- type: mteb/scidocs-reranking
2131
- name: MTEB SciDocsRR
2132
- config: default
2133
- split: test
2134
- revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab
2135
- metrics:
2136
- - type: map
2137
- value: 85.93798118350695
2138
- - type: mrr
2139
- value: 95.93536274908824
2140
- - task:
2141
- type: Retrieval
2142
- dataset:
2143
- type: scifact
2144
- name: MTEB SciFact
2145
- config: default
2146
- split: test
2147
- revision: None
2148
- metrics:
2149
- - type: map_at_1
2150
- value: 57.594
2151
- - type: map_at_10
2152
- value: 66.81899999999999
2153
- - type: map_at_100
2154
- value: 67.368
2155
- - type: map_at_1000
2156
- value: 67.4
2157
- - type: map_at_3
2158
- value: 64.061
2159
- - type: map_at_5
2160
- value: 65.47
2161
- - type: mrr_at_1
2162
- value: 60.667
2163
- - type: mrr_at_10
2164
- value: 68.219
2165
- - type: mrr_at_100
2166
- value: 68.655
2167
- - type: mrr_at_1000
2168
- value: 68.684
2169
- - type: mrr_at_3
2170
- value: 66.22200000000001
2171
- - type: mrr_at_5
2172
- value: 67.289
2173
- - type: ndcg_at_1
2174
- value: 60.667
2175
- - type: ndcg_at_10
2176
- value: 71.275
2177
- - type: ndcg_at_100
2178
- value: 73.642
2179
- - type: ndcg_at_1000
2180
- value: 74.373
2181
- - type: ndcg_at_3
2182
- value: 66.521
2183
- - type: ndcg_at_5
2184
- value: 68.581
2185
- - type: precision_at_1
2186
- value: 60.667
2187
- - type: precision_at_10
2188
- value: 9.433
2189
- - type: precision_at_100
2190
- value: 1.0699999999999998
2191
- - type: precision_at_1000
2192
- value: 0.11299999999999999
2193
- - type: precision_at_3
2194
- value: 25.556
2195
- - type: precision_at_5
2196
- value: 16.8
2197
- - type: recall_at_1
2198
- value: 57.594
2199
- - type: recall_at_10
2200
- value: 83.622
2201
- - type: recall_at_100
2202
- value: 94.167
2203
- - type: recall_at_1000
2204
- value: 99.667
2205
- - type: recall_at_3
2206
- value: 70.64399999999999
2207
- - type: recall_at_5
2208
- value: 75.983
2209
- - task:
2210
- type: PairClassification
2211
- dataset:
2212
- type: mteb/sprintduplicatequestions-pairclassification
2213
- name: MTEB SprintDuplicateQuestions
2214
- config: default
2215
- split: test
2216
- revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46
2217
- metrics:
2218
- - type: cos_sim_accuracy
2219
- value: 99.85841584158416
2220
- - type: cos_sim_ap
2221
- value: 96.66996142314342
2222
- - type: cos_sim_f1
2223
- value: 92.83208020050125
2224
- - type: cos_sim_precision
2225
- value: 93.06532663316584
2226
- - type: cos_sim_recall
2227
- value: 92.60000000000001
2228
- - type: dot_accuracy
2229
- value: 99.85841584158416
2230
- - type: dot_ap
2231
- value: 96.6775307676576
2232
- - type: dot_f1
2233
- value: 92.69289729177312
2234
- - type: dot_precision
2235
- value: 94.77533960292581
2236
- - type: dot_recall
2237
- value: 90.7
2238
- - type: euclidean_accuracy
2239
- value: 99.86138613861387
2240
- - type: euclidean_ap
2241
- value: 96.6338454403108
2242
- - type: euclidean_f1
2243
- value: 92.92214357937311
2244
- - type: euclidean_precision
2245
- value: 93.96728016359918
2246
- - type: euclidean_recall
2247
- value: 91.9
2248
- - type: manhattan_accuracy
2249
- value: 99.86237623762376
2250
- - type: manhattan_ap
2251
- value: 96.60370449645053
2252
- - type: manhattan_f1
2253
- value: 92.91177970423253
2254
- - type: manhattan_precision
2255
- value: 94.7970863683663
2256
- - type: manhattan_recall
2257
- value: 91.10000000000001
2258
- - type: max_accuracy
2259
- value: 99.86237623762376
2260
- - type: max_ap
2261
- value: 96.6775307676576
2262
- - type: max_f1
2263
- value: 92.92214357937311
2264
- - task:
2265
- type: Clustering
2266
- dataset:
2267
- type: mteb/stackexchange-clustering
2268
- name: MTEB StackExchangeClustering
2269
- config: default
2270
- split: test
2271
- revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259
2272
- metrics:
2273
- - type: v_measure
2274
- value: 60.77977058695198
2275
- - task:
2276
- type: Clustering
2277
- dataset:
2278
- type: mteb/stackexchange-clustering-p2p
2279
- name: MTEB StackExchangeClusteringP2P
2280
- config: default
2281
- split: test
2282
- revision: 815ca46b2622cec33ccafc3735d572c266efdb44
2283
- metrics:
2284
- - type: v_measure
2285
- value: 35.2725272535638
2286
- - task:
2287
- type: Reranking
2288
- dataset:
2289
- type: mteb/stackoverflowdupquestions-reranking
2290
- name: MTEB StackOverflowDupQuestions
2291
- config: default
2292
- split: test
2293
- revision: e185fbe320c72810689fc5848eb6114e1ef5ec69
2294
- metrics:
2295
- - type: map
2296
- value: 53.64052466362125
2297
- - type: mrr
2298
- value: 54.533067014684654
2299
- - task:
2300
- type: Summarization
2301
- dataset:
2302
- type: mteb/summeval
2303
- name: MTEB SummEval
2304
- config: default
2305
- split: test
2306
- revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c
2307
- metrics:
2308
- - type: cos_sim_pearson
2309
- value: 30.677624219206578
2310
- - type: cos_sim_spearman
2311
- value: 30.121368518123447
2312
- - type: dot_pearson
2313
- value: 30.69870088041608
2314
- - type: dot_spearman
2315
- value: 29.61284927093751
2316
- - task:
2317
- type: Retrieval
2318
- dataset:
2319
- type: trec-covid
2320
- name: MTEB TRECCOVID
2321
- config: default
2322
- split: test
2323
- revision: None
2324
- metrics:
2325
- - type: map_at_1
2326
- value: 0.22
2327
- - type: map_at_10
2328
- value: 1.855
2329
- - type: map_at_100
2330
- value: 9.885
2331
- - type: map_at_1000
2332
- value: 23.416999999999998
2333
- - type: map_at_3
2334
- value: 0.637
2335
- - type: map_at_5
2336
- value: 1.024
2337
- - type: mrr_at_1
2338
- value: 88.0
2339
- - type: mrr_at_10
2340
- value: 93.067
2341
- - type: mrr_at_100
2342
- value: 93.067
2343
- - type: mrr_at_1000
2344
- value: 93.067
2345
- - type: mrr_at_3
2346
- value: 92.667
2347
- - type: mrr_at_5
2348
- value: 93.067
2349
- - type: ndcg_at_1
2350
- value: 82.0
2351
- - type: ndcg_at_10
2352
- value: 75.899
2353
- - type: ndcg_at_100
2354
- value: 55.115
2355
- - type: ndcg_at_1000
2356
- value: 48.368
2357
- - type: ndcg_at_3
2358
- value: 79.704
2359
- - type: ndcg_at_5
2360
- value: 78.39699999999999
2361
- - type: precision_at_1
2362
- value: 88.0
2363
- - type: precision_at_10
2364
- value: 79.60000000000001
2365
- - type: precision_at_100
2366
- value: 56.06
2367
- - type: precision_at_1000
2368
- value: 21.206
2369
- - type: precision_at_3
2370
- value: 84.667
2371
- - type: precision_at_5
2372
- value: 83.2
2373
- - type: recall_at_1
2374
- value: 0.22
2375
- - type: recall_at_10
2376
- value: 2.078
2377
- - type: recall_at_100
2378
- value: 13.297
2379
- - type: recall_at_1000
2380
- value: 44.979
2381
- - type: recall_at_3
2382
- value: 0.6689999999999999
2383
- - type: recall_at_5
2384
- value: 1.106
2385
- - task:
2386
- type: Retrieval
2387
- dataset:
2388
- type: webis-touche2020
2389
- name: MTEB Touche2020
2390
- config: default
2391
- split: test
2392
- revision: None
2393
- metrics:
2394
- - type: map_at_1
2395
- value: 2.258
2396
- - type: map_at_10
2397
- value: 10.439
2398
- - type: map_at_100
2399
- value: 16.89
2400
- - type: map_at_1000
2401
- value: 18.407999999999998
2402
- - type: map_at_3
2403
- value: 5.668
2404
- - type: map_at_5
2405
- value: 7.718
2406
- - type: mrr_at_1
2407
- value: 32.653
2408
- - type: mrr_at_10
2409
- value: 51.159
2410
- - type: mrr_at_100
2411
- value: 51.714000000000006
2412
- - type: mrr_at_1000
2413
- value: 51.714000000000006
2414
- - type: mrr_at_3
2415
- value: 47.959
2416
- - type: mrr_at_5
2417
- value: 50.407999999999994
2418
- - type: ndcg_at_1
2419
- value: 29.592000000000002
2420
- - type: ndcg_at_10
2421
- value: 26.037
2422
- - type: ndcg_at_100
2423
- value: 37.924
2424
- - type: ndcg_at_1000
2425
- value: 49.126999999999995
2426
- - type: ndcg_at_3
2427
- value: 30.631999999999998
2428
- - type: ndcg_at_5
2429
- value: 28.571
2430
- - type: precision_at_1
2431
- value: 32.653
2432
- - type: precision_at_10
2433
- value: 22.857
2434
- - type: precision_at_100
2435
- value: 7.754999999999999
2436
- - type: precision_at_1000
2437
- value: 1.529
2438
- - type: precision_at_3
2439
- value: 34.014
2440
- - type: precision_at_5
2441
- value: 29.796
2442
- - type: recall_at_1
2443
- value: 2.258
2444
- - type: recall_at_10
2445
- value: 16.554
2446
- - type: recall_at_100
2447
- value: 48.439
2448
- - type: recall_at_1000
2449
- value: 82.80499999999999
2450
- - type: recall_at_3
2451
- value: 7.283
2452
- - type: recall_at_5
2453
- value: 10.732
2454
- - task:
2455
- type: Classification
2456
- dataset:
2457
- type: mteb/toxic_conversations_50k
2458
- name: MTEB ToxicConversationsClassification
2459
- config: default
2460
- split: test
2461
- revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c
2462
- metrics:
2463
- - type: accuracy
2464
- value: 69.8858
2465
- - type: ap
2466
- value: 13.835684144362109
2467
- - type: f1
2468
- value: 53.803351693244586
2469
- - task:
2470
- type: Classification
2471
- dataset:
2472
- type: mteb/tweet_sentiment_extraction
2473
- name: MTEB TweetSentimentExtractionClassification
2474
- config: default
2475
- split: test
2476
- revision: d604517c81ca91fe16a244d1248fc021f9ecee7a
2477
- metrics:
2478
- - type: accuracy
2479
- value: 60.50650820599886
2480
- - type: f1
2481
- value: 60.84357825979259
2482
- - task:
2483
- type: Clustering
2484
- dataset:
2485
- type: mteb/twentynewsgroups-clustering
2486
- name: MTEB TwentyNewsgroupsClustering
2487
- config: default
2488
- split: test
2489
- revision: 6125ec4e24fa026cec8a478383ee943acfbd5449
2490
- metrics:
2491
- - type: v_measure
2492
- value: 48.52131044852134
2493
- - task:
2494
- type: PairClassification
2495
- dataset:
2496
- type: mteb/twittersemeval2015-pairclassification
2497
- name: MTEB TwitterSemEval2015
2498
- config: default
2499
- split: test
2500
- revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1
2501
- metrics:
2502
- - type: cos_sim_accuracy
2503
- value: 85.59337187816654
2504
- - type: cos_sim_ap
2505
- value: 73.23925826533437
2506
- - type: cos_sim_f1
2507
- value: 67.34693877551021
2508
- - type: cos_sim_precision
2509
- value: 62.40432237730752
2510
- - type: cos_sim_recall
2511
- value: 73.13984168865434
2512
- - type: dot_accuracy
2513
- value: 85.31322644096085
2514
- - type: dot_ap
2515
- value: 72.30723963807422
2516
- - type: dot_f1
2517
- value: 66.47051612112296
2518
- - type: dot_precision
2519
- value: 62.0792305930845
2520
- - type: dot_recall
2521
- value: 71.53034300791556
2522
- - type: euclidean_accuracy
2523
- value: 85.61125350181797
2524
- - type: euclidean_ap
2525
- value: 73.32843720487845
2526
- - type: euclidean_f1
2527
- value: 67.36549633745895
2528
- - type: euclidean_precision
2529
- value: 64.60755813953489
2530
- - type: euclidean_recall
2531
- value: 70.36939313984169
2532
- - type: manhattan_accuracy
2533
- value: 85.63509566668654
2534
- - type: manhattan_ap
2535
- value: 73.16658488311325
2536
- - type: manhattan_f1
2537
- value: 67.20597386434349
2538
- - type: manhattan_precision
2539
- value: 63.60424028268551
2540
- - type: manhattan_recall
2541
- value: 71.2401055408971
2542
- - type: max_accuracy
2543
- value: 85.63509566668654
2544
- - type: max_ap
2545
- value: 73.32843720487845
2546
- - type: max_f1
2547
- value: 67.36549633745895
2548
- - task:
2549
- type: PairClassification
2550
- dataset:
2551
- type: mteb/twitterurlcorpus-pairclassification
2552
- name: MTEB TwitterURLCorpus
2553
- config: default
2554
- split: test
2555
- revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf
2556
- metrics:
2557
- - type: cos_sim_accuracy
2558
- value: 88.33779640625606
2559
- - type: cos_sim_ap
2560
- value: 84.83868375898157
2561
- - type: cos_sim_f1
2562
- value: 77.16506154017773
2563
- - type: cos_sim_precision
2564
- value: 74.62064005753327
2565
- - type: cos_sim_recall
2566
- value: 79.88912842623961
2567
- - type: dot_accuracy
2568
- value: 88.02732176815307
2569
- - type: dot_ap
2570
- value: 83.95089283763002
2571
- - type: dot_f1
2572
- value: 76.29635101196631
2573
- - type: dot_precision
2574
- value: 73.31771720613288
2575
- - type: dot_recall
2576
- value: 79.52725592854944
2577
- - type: euclidean_accuracy
2578
- value: 88.44452206310397
2579
- - type: euclidean_ap
2580
- value: 84.98384576824827
2581
- - type: euclidean_f1
2582
- value: 77.29311047696697
2583
- - type: euclidean_precision
2584
- value: 74.51232583065381
2585
- - type: euclidean_recall
2586
- value: 80.28949799815214
2587
- - type: manhattan_accuracy
2588
- value: 88.47362906042613
2589
- - type: manhattan_ap
2590
- value: 84.91421462218432
2591
- - type: manhattan_f1
2592
- value: 77.05107637204792
2593
- - type: manhattan_precision
2594
- value: 74.74484256243214
2595
- - type: manhattan_recall
2596
- value: 79.50415768401602
2597
- - type: max_accuracy
2598
- value: 88.47362906042613
2599
- - type: max_ap
2600
- value: 84.98384576824827
2601
- - type: max_f1
2602
- value: 77.29311047696697
2603
  license: mit
2604
  language:
2605
  - en
2606
  ---
2607
 
2608
 
2609
- <h1 align="center">FlagEmbedding</h1>
2610
 
2611
 
2612
- <h4 align="center">
2613
- <p>
2614
- <a href=#model-list>Model List</a> |
2615
- <a href=#frequently-asked-questions>FAQ</a> |
2616
- <a href=#usage>Usage</a> |
2617
- <a href="#evaluation">Evaluation</a> |
2618
- <a href="#train">Train</a> |
2619
- <a href="#contact">Contact</a> |
2620
- <a href="#citation">Citation</a> |
2621
- <a href="#license">License</a>
2622
- <p>
2623
- </h4>
2624
 
2625
- More details please refer to our Github: [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding).
2626
-
2627
- If you are looking for a model that supports more languages, longer texts, and other retrieval methods, you can try using [bge-m3](https://huggingface.co/BAAI/bge-m3).
2628
-
2629
-
2630
- [English](README.md) | [中文](https://github.com/FlagOpen/FlagEmbedding/blob/master/README_zh.md)
2631
-
2632
- FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently:
2633
-
2634
- - **Long-Context LLM**: [Activation Beacon](https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/activation_beacon)
2635
- - **Fine-tuning of LM** : [LM-Cocktail](https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail)
2636
- - **Dense Retrieval**: [BGE-M3](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/BGE_M3), [LLM Embedder](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/llm_embedder), [BGE Embedding](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/baai_general_embedding)
2637
- - **Reranker Model**: [BGE Reranker](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/reranker)
2638
- - **Benchmark**: [C-MTEB](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB)
2639
-
2640
- ## News
2641
- - 1/30/2024: Release **BGE-M3**, a new member to BGE model series! M3 stands for **M**ulti-linguality (100+ languages), **M**ulti-granularities (input length up to 8192), **M**ulti-Functionality (unification of dense, lexical, multi-vec/colbert retrieval).
2642
- It is the first embedding model which supports all three retrieval methods, achieving new SOTA on multi-lingual (MIRACL) and cross-lingual (MKQA) benchmarks.
2643
- [Technical Report](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/BGE_M3/BGE_M3.pdf) and [Code](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/BGE_M3). :fire:
2644
- - 1/9/2024: Release [Activation-Beacon](https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/activation_beacon), an effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM. [Technical Report](https://arxiv.org/abs/2401.03462) :fire:
2645
- - 12/24/2023: Release **LLaRA**, a LLaMA-7B based dense retriever, leading to state-of-the-art performances on MS MARCO and BEIR. Model and code will be open-sourced. Please stay tuned. [Technical Report](https://arxiv.org/abs/2312.15503) :fire:
2646
- - 11/23/2023: Release [LM-Cocktail](https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail), a method to maintain general capabilities during fine-tuning by merging multiple language models. [Technical Report](https://arxiv.org/abs/2311.13534) :fire:
2647
- - 10/12/2023: Release [LLM-Embedder](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/llm_embedder), a unified embedding model to support diverse retrieval augmentation needs for LLMs. [Technical Report](https://arxiv.org/pdf/2310.07554.pdf)
2648
- - 09/15/2023: The [technical report](https://arxiv.org/pdf/2309.07597.pdf) of BGE has been released
2649
- - 09/15/2023: The [massive training data](https://data.baai.ac.cn/details/BAAI-MTP) of BGE has been released
2650
- - 09/12/2023: New models:
2651
- - **New reranker model**: release cross-encoder models `BAAI/bge-reranker-base` and `BAAI/bge-reranker-large`, which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models.
2652
- - **update embedding model**: release `bge-*-v1.5` embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction.
2653
-
2654
-
2655
- <details>
2656
- <summary>More</summary>
2657
- <!-- ### More -->
2658
-
2659
- - 09/07/2023: Update [fine-tune code](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/README.md): Add script to mine hard negatives and support adding instruction during fine-tuning.
2660
- - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like [this](#using-langchain); C-MTEB **leaderboard** is [available](https://huggingface.co/spaces/mteb/leaderboard).
2661
- - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗**
2662
- - 08/02/2023: Release `bge-large-*`(short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada:
2663
- - 08/01/2023: We release the [Chinese Massive Text Embedding Benchmark](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB) (**C-MTEB**), consisting of 31 test dataset.
2664
-
2665
- </details>
2666
-
2667
-
2668
- ## Model List
2669
-
2670
- `bge` is short for `BAAI general embedding`.
2671
-
2672
- | Model | Language | | Description | query instruction for retrieval [1] |
2673
- |:-------------------------------|:--------:| :--------:| :--------:|:--------:|
2674
- | [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) | Multilingual | [Inference](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/BGE_M3#usage) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/BGE_M3) | Multi-Functionality(dense retrieval, sparse retrieval, multi-vector(colbert)), Multi-Linguality, and Multi-Granularity(8192 tokens) | |
2675
- | [BAAI/llm-embedder](https://huggingface.co/BAAI/llm-embedder) | English | [Inference](./FlagEmbedding/llm_embedder/README.md) [Fine-tune](./FlagEmbedding/llm_embedder/README.md) | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See [README](./FlagEmbedding/llm_embedder/README.md) |
2676
- | [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) | Chinese and English | [Inference](#usage-for-reranker) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) | a cross-encoder model which is more accurate but less efficient [2] | |
2677
- | [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) | Chinese and English | [Inference](#usage-for-reranker) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) | a cross-encoder model which is more accurate but less efficient [2] | |
2678
- | [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `Represent this sentence for searching relevant passages: ` |
2679
- | [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `Represent this sentence for searching relevant passages: ` |
2680
- | [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `Represent this sentence for searching relevant passages: ` |
2681
- | [BAAI/bge-large-zh-v1.5](https://huggingface.co/BAAI/bge-large-zh-v1.5) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `为这个句子生成表示以用于检索相关文章:` |
2682
- | [BAAI/bge-base-zh-v1.5](https://huggingface.co/BAAI/bge-base-zh-v1.5) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `为这个句子生成表示以用于检索相关文章:` |
2683
- | [BAAI/bge-small-zh-v1.5](https://huggingface.co/BAAI/bge-small-zh-v1.5) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `为这个句子生成表示以用于检索相关文章:` |
2684
- | [BAAI/bge-large-en](https://huggingface.co/BAAI/bge-large-en) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | :trophy: rank **1st** in [MTEB](https://huggingface.co/spaces/mteb/leaderboard) leaderboard | `Represent this sentence for searching relevant passages: ` |
2685
- | [BAAI/bge-base-en](https://huggingface.co/BAAI/bge-base-en) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | a base-scale model but with similar ability to `bge-large-en` | `Represent this sentence for searching relevant passages: ` |
2686
- | [BAAI/bge-small-en](https://huggingface.co/BAAI/bge-small-en) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) |a small-scale model but with competitive performance | `Represent this sentence for searching relevant passages: ` |
2687
- | [BAAI/bge-large-zh](https://huggingface.co/BAAI/bge-large-zh) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | :trophy: rank **1st** in [C-MTEB](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB) benchmark | `为这个句子生成表示以用于检索相关文章:` |
2688
- | [BAAI/bge-base-zh](https://huggingface.co/BAAI/bge-base-zh) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | a base-scale model but with similar ability to `bge-large-zh` | `为这个句子生成表示以用于检索相关文章:` |
2689
- | [BAAI/bge-small-zh](https://huggingface.co/BAAI/bge-small-zh) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | a small-scale model but with competitive performance | `为这个句子生成表示以用于检索相关文章:` |
2690
-
2691
- [1\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages.
2692
-
2693
- [2\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models.
2694
- For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results.
2695
-
2696
- All models have been uploaded to Huggingface Hub, and you can see them at https://huggingface.co/BAAI.
2697
- If you cannot open the Huggingface Hub, you also can download the models at https://model.baai.ac.cn/models .
2698
-
2699
-
2700
- ## Frequently asked questions
2701
-
2702
- <details>
2703
- <summary>1. How to fine-tune bge embedding model?</summary>
2704
-
2705
- <!-- ### How to fine-tune bge embedding model? -->
2706
- Following this [example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) to prepare data and fine-tune your model.
2707
- Some suggestions:
2708
- - Mine hard negatives following this [example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune#hard-negatives), which can improve the retrieval performance.
2709
- - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity.
2710
- - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker.
2711
-
2712
-
2713
- </details>
2714
-
2715
- <details>
2716
- <summary>2. The similarity score between two dissimilar sentences is higher than 0.5</summary>
2717
-
2718
- <!-- ### The similarity score between two dissimilar sentences is higher than 0.5 -->
2719
- **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.**
2720
-
2721
- Since we finetune the models by contrastive learning with a temperature of 0.01,
2722
- the similarity distribution of the current BGE model is about in the interval \[0.6, 1\].
2723
- So a similarity score greater than 0.5 does not indicate that the two sentences are similar.
2724
-
2725
- For downstream tasks, such as passage retrieval or semantic similarity,
2726
- **what matters is the relative order of the scores, not the absolute value.**
2727
- If you need to filter similar sentences based on a similarity threshold,
2728
- please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9).
2729
-
2730
- </details>
2731
-
2732
- <details>
2733
- <summary>3. When does the query instruction need to be used</summary>
2734
-
2735
- <!-- ### When does the query instruction need to be used -->
2736
-
2737
- For the `bge-*-v1.5`, we improve its retrieval ability when not using instruction.
2738
- No instruction only has a slight degradation in retrieval performance compared with using instruction.
2739
- So you can generate embedding without instruction in all cases for convenience.
2740
-
2741
- For a retrieval task that uses short queries to find long related documents,
2742
- it is recommended to add instructions for these short queries.
2743
- **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.**
2744
- In all cases, the documents/passages do not need to add the instruction.
2745
-
2746
- </details>
2747
 
2748
 
2749
  ## Usage
2750
 
2751
- ### Usage for Embedding Model
2752
-
2753
- Here are some examples for using `bge` models with
2754
- [FlagEmbedding](#using-flagembedding), [Sentence-Transformers](#using-sentence-transformers), [Langchain](#using-langchain), or [Huggingface Transformers](#using-huggingface-transformers).
2755
-
2756
- #### Using FlagEmbedding
2757
- ```
2758
- pip install -U FlagEmbedding
2759
- ```
2760
- If it doesn't work for you, you can see [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/README.md) for more methods to install FlagEmbedding.
2761
-
2762
- ```python
2763
- from FlagEmbedding import FlagModel
2764
- sentences_1 = ["样例数据-1", "样例数据-2"]
2765
- sentences_2 = ["样例数据-3", "样例数据-4"]
2766
- model = FlagModel('BAAI/bge-large-zh-v1.5',
2767
- query_instruction_for_retrieval="为这个句子生成表示以用于检索相关文章:",
2768
- use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation
2769
- embeddings_1 = model.encode(sentences_1)
2770
- embeddings_2 = model.encode(sentences_2)
2771
- similarity = embeddings_1 @ embeddings_2.T
2772
- print(similarity)
2773
-
2774
- # for s2p(short query to long passage) retrieval task, suggest to use encode_queries() which will automatically add the instruction to each query
2775
- # corpus in retrieval task can still use encode() or encode_corpus(), since they don't need instruction
2776
- queries = ['query_1', 'query_2']
2777
- passages = ["样例文档-1", "样例文档-2"]
2778
- q_embeddings = model.encode_queries(queries)
2779
- p_embeddings = model.encode(passages)
2780
- scores = q_embeddings @ p_embeddings.T
2781
- ```
2782
- For the value of the argument `query_instruction_for_retrieval`, see [Model List](https://github.com/FlagOpen/FlagEmbedding/tree/master#model-list).
2783
-
2784
- By default, FlagModel will use all available GPUs when encoding. Please set `os.environ["CUDA_VISIBLE_DEVICES"]` to select specific GPUs.
2785
- You also can set `os.environ["CUDA_VISIBLE_DEVICES"]=""` to make all GPUs unavailable.
2786
-
2787
-
2788
- #### Using Sentence-Transformers
2789
-
2790
- You can also use the `bge` models with [sentence-transformers](https://www.SBERT.net):
2791
-
2792
- ```
2793
- pip install -U sentence-transformers
2794
- ```
2795
- ```python
2796
- from sentence_transformers import SentenceTransformer
2797
- sentences_1 = ["样例数据-1", "样例数据-2"]
2798
- sentences_2 = ["样例数据-3", "样例数据-4"]
2799
- model = SentenceTransformer('BAAI/bge-large-zh-v1.5')
2800
- embeddings_1 = model.encode(sentences_1, normalize_embeddings=True)
2801
- embeddings_2 = model.encode(sentences_2, normalize_embeddings=True)
2802
- similarity = embeddings_1 @ embeddings_2.T
2803
- print(similarity)
2804
- ```
2805
- For s2p(short query to long passage) retrieval task,
2806
- each short query should start with an instruction (instructions see [Model List](https://github.com/FlagOpen/FlagEmbedding/tree/master#model-list)).
2807
- But the instruction is not needed for passages.
2808
- ```python
2809
- from sentence_transformers import SentenceTransformer
2810
- queries = ['query_1', 'query_2']
2811
- passages = ["样例文档-1", "样例文档-2"]
2812
- instruction = "为这个句子生成表示以用于检索相关文章:"
2813
-
2814
- model = SentenceTransformer('BAAI/bge-large-zh-v1.5')
2815
- q_embeddings = model.encode([instruction+q for q in queries], normalize_embeddings=True)
2816
- p_embeddings = model.encode(passages, normalize_embeddings=True)
2817
- scores = q_embeddings @ p_embeddings.T
2818
- ```
2819
-
2820
- #### Using Langchain
2821
-
2822
- You can use `bge` in langchain like this:
2823
- ```python
2824
- from langchain.embeddings import HuggingFaceBgeEmbeddings
2825
- model_name = "BAAI/bge-large-en-v1.5"
2826
- model_kwargs = {'device': 'cuda'}
2827
- encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity
2828
- model = HuggingFaceBgeEmbeddings(
2829
- model_name=model_name,
2830
- model_kwargs=model_kwargs,
2831
- encode_kwargs=encode_kwargs,
2832
- query_instruction="为这个句子生成表示以用于检索相关文章:"
2833
- )
2834
- model.query_instruction = "为这个句子生成表示以用于检索相关文章:"
2835
- ```
2836
-
2837
-
2838
- #### Using HuggingFace Transformers
2839
-
2840
- With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding.
2841
 
2842
- ```python
2843
- from transformers import AutoTokenizer, AutoModel
2844
- import torch
2845
- # Sentences we want sentence embeddings for
2846
- sentences = ["样例数据-1", "样例数据-2"]
2847
-
2848
- # Load model from HuggingFace Hub
2849
- tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-large-zh-v1.5')
2850
- model = AutoModel.from_pretrained('BAAI/bge-large-zh-v1.5')
2851
- model.eval()
2852
-
2853
- # Tokenize sentences
2854
- encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
2855
- # for s2p(short query to long passage) retrieval task, add an instruction to query (not add instruction for passages)
2856
- # encoded_input = tokenizer([instruction + q for q in queries], padding=True, truncation=True, return_tensors='pt')
2857
-
2858
- # Compute token embeddings
2859
- with torch.no_grad():
2860
- model_output = model(**encoded_input)
2861
- # Perform pooling. In this case, cls pooling.
2862
- sentence_embeddings = model_output[0][:, 0]
2863
- # normalize embeddings
2864
- sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1)
2865
- print("Sentence embeddings:", sentence_embeddings)
2866
- ```
2867
-
2868
- ### Usage for Reranker
2869
-
2870
- Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding.
2871
- You can get a relevance score by inputting query and passage to the reranker.
2872
- The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range.
2873
-
2874
-
2875
- #### Using FlagEmbedding
2876
- ```
2877
- pip install -U FlagEmbedding
2878
- ```
2879
-
2880
- Get relevance scores (higher scores indicate more relevance):
2881
- ```python
2882
- from FlagEmbedding import FlagReranker
2883
- reranker = FlagReranker('BAAI/bge-reranker-large', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation
2884
-
2885
- score = reranker.compute_score(['query', 'passage'])
2886
- print(score)
2887
-
2888
- scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']])
2889
- print(scores)
2890
- ```
2891
-
2892
-
2893
- #### Using Huggingface transformers
2894
-
2895
- ```python
2896
- import torch
2897
- from transformers import AutoModelForSequenceClassification, AutoTokenizer
2898
-
2899
- tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-large')
2900
- model = AutoModelForSequenceClassification.from_pretrained('BAAI/bge-reranker-large')
2901
- model.eval()
2902
-
2903
- pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]
2904
- with torch.no_grad():
2905
- inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)
2906
- scores = model(**inputs, return_dict=True).logits.view(-1, ).float()
2907
- print(scores)
2908
- ```
2909
-
2910
- #### Usage of the ONNX files
2911
-
2912
- ```python
2913
- from optimum.onnxruntime import ORTModelForFeatureExtraction # type: ignore
2914
-
2915
- import torch
2916
- from transformers import AutoModel, AutoTokenizer
2917
-
2918
- tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-small-en-v1.5')
2919
- model = AutoModel.from_pretrained('BAAI/bge-small-en-v1.5')
2920
- model_ort = ORTModelForFeatureExtraction.from_pretrained('BAAI/bge-small-en-v1.5', file_name="onnx/model.onnx")
2921
-
2922
- # Sentences we want sentence embeddings for
2923
- sentences = ["样例数据-1", "样例数据-2"]
2924
-
2925
- # Tokenize sentences
2926
- encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
2927
- # for s2p(short query to long passage) retrieval task, add an instruction to query (not add instruction for passages)
2928
- # encoded_input = tokenizer([instruction + q for q in queries], padding=True, truncation=True, return_tensors='pt')
2929
-
2930
- model_output_ort = model_ort(**encoded_input)
2931
- # Compute token embeddings
2932
- with torch.no_grad():
2933
- model_output = model(**encoded_input)
2934
-
2935
- # model_output and model_output_ort are identical
2936
-
2937
- ```
2938
-
2939
- #### Usage via infinity
2940
- Its also possible to deploy the onnx files with the [infinity_emb](https://github.com/michaelfeil/infinity) pip package.
2941
  Recommended is `device="cuda", engine="torch"` with flash attention on gpu, and `device="cpu", engine="optimum"` for onnx inference.
2942
 
2943
  ```python
@@ -2956,102 +40,9 @@ asyncio.run(main())
2956
  ```
2957
 
2958
 
2959
- ## Evaluation
2960
-
2961
- `baai-general-embedding` models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!**
2962
- For more details and evaluation tools see our [scripts](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB/README.md).
2963
-
2964
- - **MTEB**:
2965
-
2966
- | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) |
2967
- |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
2968
- | [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 |
2969
- | [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 |
2970
- | [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 |
2971
- | [bge-large-en](https://huggingface.co/BAAI/bge-large-en) | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 |
2972
- | [bge-base-en](https://huggingface.co/BAAI/bge-base-en) | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 |
2973
- | [gte-large](https://huggingface.co/thenlper/gte-large) | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 |
2974
- | [gte-base](https://huggingface.co/thenlper/gte-base) | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 |
2975
- | [e5-large-v2](https://huggingface.co/intfloat/e5-large-v2) | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 |
2976
- | [bge-small-en](https://huggingface.co/BAAI/bge-small-en) | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 |
2977
- | [instructor-xl](https://huggingface.co/hkunlp/instructor-xl) | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 |
2978
- | [e5-base-v2](https://huggingface.co/intfloat/e5-base-v2) | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 |
2979
- | [gte-small](https://huggingface.co/thenlper/gte-small) | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 |
2980
- | [text-embedding-ada-002](https://platform.openai.com/docs/guides/embeddings) | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 |
2981
- | [e5-small-v2](https://huggingface.co/intfloat/e5-base-v2) | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 |
2982
- | [sentence-t5-xxl](https://huggingface.co/sentence-transformers/sentence-t5-xxl) | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 |
2983
- | [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 |
2984
- | [sgpt-bloom-7b1-msmarco](https://huggingface.co/bigscience/sgpt-bloom-7b1-msmarco) | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 |
2985
-
2986
-
2987
-
2988
- - **C-MTEB**:
2989
- We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks.
2990
- Please refer to [C_MTEB](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB/README.md) for a detailed introduction.
2991
-
2992
- | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering |
2993
- |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|
2994
- | [**BAAI/bge-large-zh-v1.5**](https://huggingface.co/BAAI/bge-large-zh-v1.5) | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 |
2995
- | [BAAI/bge-base-zh-v1.5](https://huggingface.co/BAAI/bge-base-zh-v1.5) | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 |
2996
- | [BAAI/bge-small-zh-v1.5](https://huggingface.co/BAAI/bge-small-zh-v1.5) | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 |
2997
- | [BAAI/bge-large-zh](https://huggingface.co/BAAI/bge-large-zh) | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 |
2998
- | [bge-large-zh-noinstruct](https://huggingface.co/BAAI/bge-large-zh-noinstruct) | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 |
2999
- | [BAAI/bge-base-zh](https://huggingface.co/BAAI/bge-base-zh) | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 |
3000
- | [multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large) | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 |
3001
- | [BAAI/bge-small-zh](https://huggingface.co/BAAI/bge-small-zh) | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 |
3002
- | [m3e-base](https://huggingface.co/moka-ai/m3e-base) | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 |
3003
- | [m3e-large](https://huggingface.co/moka-ai/m3e-large) | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 |
3004
- | [multilingual-e5-base](https://huggingface.co/intfloat/multilingual-e5-base) | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 |
3005
- | [multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small) | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 |
3006
- | [text-embedding-ada-002(OpenAI)](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 |
3007
- | [luotuo](https://huggingface.co/silk-road/luotuo-bert-medium) | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 |
3008
- | [text2vec-base](https://huggingface.co/shibing624/text2vec-base-chinese) | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 |
3009
- | [text2vec-large](https://huggingface.co/GanymedeNil/text2vec-large-chinese) | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 |
3010
-
3011
-
3012
- - **Reranking**:
3013
- See [C_MTEB](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB/) for evaluation script.
3014
-
3015
- | Model | T2Reranking | T2RerankingZh2En\* | T2RerankingEn2Zh\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg |
3016
- |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|
3017
- | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 |
3018
- | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 |
3019
- | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 |
3020
- | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 |
3021
- | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 |
3022
- | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 |
3023
- | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 |
3024
- | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 |
3025
- | [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 |
3026
- | [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 |
3027
-
3028
- \* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks
3029
-
3030
- ## Train
3031
-
3032
- ### BAAI Embedding
3033
-
3034
- We pre-train the models using [retromae](https://github.com/staoxiao/RetroMAE) and train them on large-scale pairs data using contrastive learning.
3035
- **You can fine-tune the embedding model on your data following our [examples](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune).**
3036
- We also provide a [pre-train example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/pretrain).
3037
- Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned.
3038
- More training details for bge see [baai_general_embedding](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/README.md).
3039
-
3040
-
3041
-
3042
- ### BGE Reranker
3043
-
3044
- Cross-encoder will perform full-attention over the input pair,
3045
- which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model.
3046
- Therefore, it can be used to re-rank the top-k documents returned by embedding model.
3047
- We train the cross-encoder on a multilingual pair data,
3048
- The data format is the same as embedding model, so you can fine-tune it easily following our [example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker).
3049
- More details please refer to [./FlagEmbedding/reranker/README.md](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/reranker)
3050
-
3051
-
3052
  ## Contact
3053
  If you have any question or suggestion related to this project, feel free to open an issue or pull request.
3054
- You also can email Shitao Xiao(stxiao@baai.ac.cn) and Zheng Liu(liuzheng@baai.ac.cn).
3055
 
3056
 
3057
  ## Citation
@@ -3059,16 +50,15 @@ You also can email Shitao Xiao(stxiao@baai.ac.cn) and Zheng Liu(liuzheng@baai.ac
3059
  If you find this repository useful, please consider giving a star :star: and citation
3060
 
3061
  ```
3062
- @misc{bge_embedding,
3063
- title={C-Pack: Packaged Resources To Advance General Chinese Embedding},
3064
- author={Shitao Xiao and Zheng Liu and Peitian Zhang and Niklas Muennighoff},
3065
- year={2023},
3066
- eprint={2309.07597},
3067
- archivePrefix={arXiv},
3068
- primaryClass={cs.CL}
3069
  }
3070
  ```
3071
 
3072
  ## License
3073
- FlagEmbedding is licensed under the [MIT License](https://github.com/FlagOpen/FlagEmbedding/blob/master/LICENSE). The released models can be used for commercial purposes free of charge.
3074
 
 
4
  - feature-extraction
5
  - sentence-similarity
6
  - transformers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  license: mit
8
  language:
9
  - en
10
  ---
11
 
12
 
13
+ <h1 align="center">Infinity Embedding Model</h1>
14
 
15
 
16
+ More details please refer to the Github: [Infinity](https://github.com/michaelfeil/infinity).
 
 
 
 
 
 
 
 
 
 
 
17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
 
20
  ## Usage
21
 
22
+ ### Usage for Embedding Model via infinity
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
+ Its also possible to deploy files with the [infinity_emb](https://github.com/michaelfeil/infinity) pip package.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  Recommended is `device="cuda", engine="torch"` with flash attention on gpu, and `device="cpu", engine="optimum"` for onnx inference.
26
 
27
  ```python
 
40
  ```
41
 
42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
  ## Contact
44
  If you have any question or suggestion related to this project, feel free to open an issue or pull request.
45
+ You also can email Michael Feil (infinity at michaelfeil.eu).
46
 
47
 
48
  ## Citation
 
50
  If you find this repository useful, please consider giving a star :star: and citation
51
 
52
  ```
53
+ @software{Feil_Infinity_2023,
54
+ author = {Feil, Michael},
55
+ month = oct,
56
+ title = {{Infinity - To Embeddings and Beyond}},
57
+ url = {https://github.com/michaelfeil/infinity},
58
+ year = {2023}
 
59
  }
60
  ```
61
 
62
  ## License
63
+ Infinity is licensed under the [MIT License](https://github.com/michaelfeil/infinity/blob/master/LICENSE).
64