nreimers commited on
Commit
af99964
1 Parent(s): 91a9877
README.md ADDED
@@ -0,0 +1,1103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - mteb
4
+ model-index:
5
+ - name: embed-english-light-v3.0
6
+ results:
7
+ - task:
8
+ type: Classification
9
+ dataset:
10
+ type: mteb/amazon_counterfactual
11
+ name: MTEB AmazonCounterfactualClassification (en)
12
+ config: en
13
+ split: test
14
+ revision: e8379541af4e31359cca9fbcf4b00f2671dba205
15
+ metrics:
16
+ - type: accuracy
17
+ value: 78.62686567164178
18
+ - type: ap
19
+ value: 43.50072127690769
20
+ - type: f1
21
+ value: 73.12414870629323
22
+ - task:
23
+ type: Classification
24
+ dataset:
25
+ type: mteb/amazon_polarity
26
+ name: MTEB AmazonPolarityClassification
27
+ config: default
28
+ split: test
29
+ revision: e2d317d38cd51312af73b3d32a06d1a08b442046
30
+ metrics:
31
+ - type: accuracy
32
+ value: 94.795
33
+ - type: ap
34
+ value: 92.14178233328848
35
+ - type: f1
36
+ value: 94.79269356571955
37
+ - task:
38
+ type: Classification
39
+ dataset:
40
+ type: mteb/amazon_reviews_multi
41
+ name: MTEB AmazonReviewsClassification (en)
42
+ config: en
43
+ split: test
44
+ revision: 1399c76144fd37290681b995c656ef9b2e06e26d
45
+ metrics:
46
+ - type: accuracy
47
+ value: 51.016000000000005
48
+ - type: f1
49
+ value: 48.9266470039522
50
+ - task:
51
+ type: Retrieval
52
+ dataset:
53
+ type: arguana
54
+ name: MTEB ArguAna
55
+ config: default
56
+ split: test
57
+ revision: None
58
+ metrics:
59
+ - type: ndcg_at_10
60
+ value: 50.806
61
+ - task:
62
+ type: Clustering
63
+ dataset:
64
+ type: mteb/arxiv-clustering-p2p
65
+ name: MTEB ArxivClusteringP2P
66
+ config: default
67
+ split: test
68
+ revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d
69
+ metrics:
70
+ - type: v_measure
71
+ value: 46.19304218375896
72
+ - task:
73
+ type: Clustering
74
+ dataset:
75
+ type: mteb/arxiv-clustering-s2s
76
+ name: MTEB ArxivClusteringS2S
77
+ config: default
78
+ split: test
79
+ revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53
80
+ metrics:
81
+ - type: v_measure
82
+ value: 37.57785041962193
83
+ - task:
84
+ type: Reranking
85
+ dataset:
86
+ type: mteb/askubuntudupquestions-reranking
87
+ name: MTEB AskUbuntuDupQuestions
88
+ config: default
89
+ split: test
90
+ revision: 2000358ca161889fa9c082cb41daa8dcfb161a54
91
+ metrics:
92
+ - type: map
93
+ value: 60.11396377106911
94
+ - type: mrr
95
+ value: 72.9068284746955
96
+ - task:
97
+ type: STS
98
+ dataset:
99
+ type: mteb/biosses-sts
100
+ name: MTEB BIOSSES
101
+ config: default
102
+ split: test
103
+ revision: d3fb88f8f02e40887cd149695127462bbcf29b4a
104
+ metrics:
105
+ - type: cos_sim_pearson
106
+ value: 82.59354737468067
107
+ - type: cos_sim_spearman
108
+ value: 81.71933190993215
109
+ - type: euclidean_pearson
110
+ value: 81.39212345994983
111
+ - type: euclidean_spearman
112
+ value: 81.71933190993215
113
+ - type: manhattan_pearson
114
+ value: 81.29257414603093
115
+ - type: manhattan_spearman
116
+ value: 81.80246633432691
117
+ - task:
118
+ type: Classification
119
+ dataset:
120
+ type: mteb/banking77
121
+ name: MTEB Banking77Classification
122
+ config: default
123
+ split: test
124
+ revision: 0fd18e25b25c072e09e0d92ab615fda904d66300
125
+ metrics:
126
+ - type: accuracy
127
+ value: 79.69805194805193
128
+ - type: f1
129
+ value: 79.07431143559548
130
+ - task:
131
+ type: Clustering
132
+ dataset:
133
+ type: mteb/biorxiv-clustering-p2p
134
+ name: MTEB BiorxivClusteringP2P
135
+ config: default
136
+ split: test
137
+ revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40
138
+ metrics:
139
+ - type: v_measure
140
+ value: 38.973417975095934
141
+ - task:
142
+ type: Clustering
143
+ dataset:
144
+ type: mteb/biorxiv-clustering-s2s
145
+ name: MTEB BiorxivClusteringS2S
146
+ config: default
147
+ split: test
148
+ revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908
149
+ metrics:
150
+ - type: v_measure
151
+ value: 34.51608057107556
152
+ - task:
153
+ type: Retrieval
154
+ dataset:
155
+ type: BeIR/cqadupstack
156
+ name: MTEB CQADupstackAndroidRetrieval
157
+ config: default
158
+ split: test
159
+ revision: None
160
+ metrics:
161
+ - type: ndcg_at_10
162
+ value: 46.615
163
+ - task:
164
+ type: Retrieval
165
+ dataset:
166
+ type: BeIR/cqadupstack
167
+ name: MTEB CQADupstackEnglishRetrieval
168
+ config: default
169
+ split: test
170
+ revision: None
171
+ metrics:
172
+ - type: ndcg_at_10
173
+ value: 45.383
174
+ - task:
175
+ type: Retrieval
176
+ dataset:
177
+ type: BeIR/cqadupstack
178
+ name: MTEB CQADupstackGamingRetrieval
179
+ config: default
180
+ split: test
181
+ revision: None
182
+ metrics:
183
+ - type: ndcg_at_10
184
+ value: 57.062999999999995
185
+ - task:
186
+ type: Retrieval
187
+ dataset:
188
+ type: BeIR/cqadupstack
189
+ name: MTEB CQADupstackGisRetrieval
190
+ config: default
191
+ split: test
192
+ revision: None
193
+ metrics:
194
+ - type: ndcg_at_10
195
+ value: 37.201
196
+ - task:
197
+ type: Retrieval
198
+ dataset:
199
+ type: BeIR/cqadupstack
200
+ name: MTEB CQADupstackMathematicaRetrieval
201
+ config: default
202
+ split: test
203
+ revision: None
204
+ metrics:
205
+ - type: ndcg_at_10
206
+ value: 27.473
207
+ - task:
208
+ type: Retrieval
209
+ dataset:
210
+ type: BeIR/cqadupstack
211
+ name: MTEB CQADupstackPhysicsRetrieval
212
+ config: default
213
+ split: test
214
+ revision: None
215
+ metrics:
216
+ - type: ndcg_at_10
217
+ value: 41.868
218
+ - task:
219
+ type: Retrieval
220
+ dataset:
221
+ type: BeIR/cqadupstack
222
+ name: MTEB CQADupstackProgrammersRetrieval
223
+ config: default
224
+ split: test
225
+ revision: None
226
+ metrics:
227
+ - type: ndcg_at_10
228
+ value: 42.059000000000005
229
+ - task:
230
+ type: Retrieval
231
+ dataset:
232
+ type: BeIR/cqadupstack
233
+ name: MTEB CQADupstackRetrieval
234
+ config: default
235
+ split: test
236
+ revision: None
237
+ metrics:
238
+ - type: ndcg_at_10
239
+ value: 38.885416666666664
240
+ - task:
241
+ type: Retrieval
242
+ dataset:
243
+ type: BeIR/cqadupstack
244
+ name: MTEB CQADupstackStatsRetrieval
245
+ config: default
246
+ split: test
247
+ revision: None
248
+ metrics:
249
+ - type: ndcg_at_10
250
+ value: 32.134
251
+ - task:
252
+ type: Retrieval
253
+ dataset:
254
+ type: BeIR/cqadupstack
255
+ name: MTEB CQADupstackTexRetrieval
256
+ config: default
257
+ split: test
258
+ revision: None
259
+ metrics:
260
+ - type: ndcg_at_10
261
+ value: 28.052
262
+ - task:
263
+ type: Retrieval
264
+ dataset:
265
+ type: BeIR/cqadupstack
266
+ name: MTEB CQADupstackUnixRetrieval
267
+ config: default
268
+ split: test
269
+ revision: None
270
+ metrics:
271
+ - type: ndcg_at_10
272
+ value: 38.237
273
+ - task:
274
+ type: Retrieval
275
+ dataset:
276
+ type: BeIR/cqadupstack
277
+ name: MTEB CQADupstackWebmastersRetrieval
278
+ config: default
279
+ split: test
280
+ revision: None
281
+ metrics:
282
+ - type: ndcg_at_10
283
+ value: 37.875
284
+ - task:
285
+ type: Retrieval
286
+ dataset:
287
+ type: BeIR/cqadupstack
288
+ name: MTEB CQADupstackWordpressRetrieval
289
+ config: default
290
+ split: test
291
+ revision: None
292
+ metrics:
293
+ - type: ndcg_at_10
294
+ value: 32.665
295
+ - task:
296
+ type: Retrieval
297
+ dataset:
298
+ type: climate-fever
299
+ name: MTEB ClimateFEVER
300
+ config: default
301
+ split: test
302
+ revision: None
303
+ metrics:
304
+ - type: ndcg_at_10
305
+ value: 28.901
306
+ - task:
307
+ type: Retrieval
308
+ dataset:
309
+ type: dbpedia-entity
310
+ name: MTEB DBPedia
311
+ config: default
312
+ split: test
313
+ revision: None
314
+ metrics:
315
+ - type: ndcg_at_10
316
+ value: 41.028
317
+ - task:
318
+ type: Classification
319
+ dataset:
320
+ type: mteb/emotion
321
+ name: MTEB EmotionClassification
322
+ config: default
323
+ split: test
324
+ revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37
325
+ metrics:
326
+ - type: accuracy
327
+ value: 52.745
328
+ - type: f1
329
+ value: 46.432564522368054
330
+ - task:
331
+ type: Retrieval
332
+ dataset:
333
+ type: fever
334
+ name: MTEB FEVER
335
+ config: default
336
+ split: test
337
+ revision: None
338
+ metrics:
339
+ - type: ndcg_at_10
340
+ value: 87.64
341
+ - task:
342
+ type: Retrieval
343
+ dataset:
344
+ type: fiqa
345
+ name: MTEB FiQA2018
346
+ config: default
347
+ split: test
348
+ revision: None
349
+ metrics:
350
+ - type: ndcg_at_10
351
+ value: 38.834999999999994
352
+ - task:
353
+ type: Retrieval
354
+ dataset:
355
+ type: hotpotqa
356
+ name: MTEB HotpotQA
357
+ config: default
358
+ split: test
359
+ revision: None
360
+ metrics:
361
+ - type: ndcg_at_10
362
+ value: 66.793
363
+ - task:
364
+ type: Classification
365
+ dataset:
366
+ type: mteb/imdb
367
+ name: MTEB ImdbClassification
368
+ config: default
369
+ split: test
370
+ revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7
371
+ metrics:
372
+ - type: accuracy
373
+ value: 92.16680000000001
374
+ - type: ap
375
+ value: 88.9326260956379
376
+ - type: f1
377
+ value: 92.16197209455585
378
+ - task:
379
+ type: Retrieval
380
+ dataset:
381
+ type: msmarco
382
+ name: MTEB MSMARCO
383
+ config: default
384
+ split: test
385
+ revision: None
386
+ metrics:
387
+ - type: ndcg_at_10
388
+ value: 41.325
389
+ - task:
390
+ type: Classification
391
+ dataset:
392
+ type: mteb/mtop_domain
393
+ name: MTEB MTOPDomainClassification (en)
394
+ config: en
395
+ split: test
396
+ revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
397
+ metrics:
398
+ - type: accuracy
399
+ value: 93.62517099863202
400
+ - type: f1
401
+ value: 93.3852826127328
402
+ - task:
403
+ type: Classification
404
+ dataset:
405
+ type: mteb/mtop_intent
406
+ name: MTEB MTOPIntentClassification (en)
407
+ config: en
408
+ split: test
409
+ revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
410
+ metrics:
411
+ - type: accuracy
412
+ value: 64.93388052895577
413
+ - type: f1
414
+ value: 48.035548201830366
415
+ - task:
416
+ type: Classification
417
+ dataset:
418
+ type: mteb/amazon_massive_intent
419
+ name: MTEB MassiveIntentClassification (en)
420
+ config: en
421
+ split: test
422
+ revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
423
+ metrics:
424
+ - type: accuracy
425
+ value: 70.01344989912577
426
+ - type: f1
427
+ value: 68.01236893966525
428
+ - task:
429
+ type: Classification
430
+ dataset:
431
+ type: mteb/amazon_massive_scenario
432
+ name: MTEB MassiveScenarioClassification (en)
433
+ config: en
434
+ split: test
435
+ revision: 7d571f92784cd94a019292a1f45445077d0ef634
436
+ metrics:
437
+ - type: accuracy
438
+ value: 76.34498991257564
439
+ - type: f1
440
+ value: 75.72876911765213
441
+ - task:
442
+ type: Clustering
443
+ dataset:
444
+ type: mteb/medrxiv-clustering-p2p
445
+ name: MTEB MedrxivClusteringP2P
446
+ config: default
447
+ split: test
448
+ revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73
449
+ metrics:
450
+ - type: v_measure
451
+ value: 37.66326759167091
452
+ - task:
453
+ type: Clustering
454
+ dataset:
455
+ type: mteb/medrxiv-clustering-s2s
456
+ name: MTEB MedrxivClusteringS2S
457
+ config: default
458
+ split: test
459
+ revision: 35191c8c0dca72d8ff3efcd72aa802307d469663
460
+ metrics:
461
+ - type: v_measure
462
+ value: 33.53562430544494
463
+ - task:
464
+ type: Reranking
465
+ dataset:
466
+ type: mteb/mind_small
467
+ name: MTEB MindSmallReranking
468
+ config: default
469
+ split: test
470
+ revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69
471
+ metrics:
472
+ - type: map
473
+ value: 31.86814320224619
474
+ - type: mrr
475
+ value: 33.02567757581291
476
+ - task:
477
+ type: Retrieval
478
+ dataset:
479
+ type: nfcorpus
480
+ name: MTEB NFCorpus
481
+ config: default
482
+ split: test
483
+ revision: None
484
+ metrics:
485
+ - type: ndcg_at_10
486
+ value: 33.649
487
+ - task:
488
+ type: Retrieval
489
+ dataset:
490
+ type: nq
491
+ name: MTEB NQ
492
+ config: default
493
+ split: test
494
+ revision: None
495
+ metrics:
496
+ - type: ndcg_at_10
497
+ value: 57.994
498
+ - task:
499
+ type: Retrieval
500
+ dataset:
501
+ type: quora
502
+ name: MTEB QuoraRetrieval
503
+ config: default
504
+ split: test
505
+ revision: None
506
+ metrics:
507
+ - type: ndcg_at_10
508
+ value: 88.115
509
+ - task:
510
+ type: Clustering
511
+ dataset:
512
+ type: mteb/reddit-clustering
513
+ name: MTEB RedditClustering
514
+ config: default
515
+ split: test
516
+ revision: 24640382cdbf8abc73003fb0fa6d111a705499eb
517
+ metrics:
518
+ - type: v_measure
519
+ value: 53.4970929237201
520
+ - task:
521
+ type: Clustering
522
+ dataset:
523
+ type: mteb/reddit-clustering-p2p
524
+ name: MTEB RedditClusteringP2P
525
+ config: default
526
+ split: test
527
+ revision: 282350215ef01743dc01b456c7f5241fa8937f16
528
+ metrics:
529
+ - type: v_measure
530
+ value: 63.59086757472922
531
+ - task:
532
+ type: Retrieval
533
+ dataset:
534
+ type: scidocs
535
+ name: MTEB SCIDOCS
536
+ config: default
537
+ split: test
538
+ revision: None
539
+ metrics:
540
+ - type: ndcg_at_10
541
+ value: 18.098
542
+ - task:
543
+ type: STS
544
+ dataset:
545
+ type: mteb/sickr-sts
546
+ name: MTEB SICK-R
547
+ config: default
548
+ split: test
549
+ revision: a6ea5a8cab320b040a23452cc28066d9beae2cee
550
+ metrics:
551
+ - type: cos_sim_pearson
552
+ value: 85.05019841005287
553
+ - type: cos_sim_spearman
554
+ value: 79.65240734965128
555
+ - type: euclidean_pearson
556
+ value: 82.33894047327843
557
+ - type: euclidean_spearman
558
+ value: 79.65240666088022
559
+ - type: manhattan_pearson
560
+ value: 82.33098051639543
561
+ - type: manhattan_spearman
562
+ value: 79.5592521956291
563
+ - task:
564
+ type: STS
565
+ dataset:
566
+ type: mteb/sts12-sts
567
+ name: MTEB STS12
568
+ config: default
569
+ split: test
570
+ revision: a0d554a64d88156834ff5ae9920b964011b16384
571
+ metrics:
572
+ - type: cos_sim_pearson
573
+ value: 81.28561469269728
574
+ - type: cos_sim_spearman
575
+ value: 72.6022866501722
576
+ - type: euclidean_pearson
577
+ value: 77.89616448619745
578
+ - type: euclidean_spearman
579
+ value: 72.6022866429173
580
+ - type: manhattan_pearson
581
+ value: 77.9073648819866
582
+ - type: manhattan_spearman
583
+ value: 72.6928162672852
584
+ - task:
585
+ type: STS
586
+ dataset:
587
+ type: mteb/sts13-sts
588
+ name: MTEB STS13
589
+ config: default
590
+ split: test
591
+ revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca
592
+ metrics:
593
+ - type: cos_sim_pearson
594
+ value: 82.48271297318195
595
+ - type: cos_sim_spearman
596
+ value: 82.87639489647019
597
+ - type: euclidean_pearson
598
+ value: 82.24654676315204
599
+ - type: euclidean_spearman
600
+ value: 82.87642765399856
601
+ - type: manhattan_pearson
602
+ value: 82.19673632886851
603
+ - type: manhattan_spearman
604
+ value: 82.822727205448
605
+ - task:
606
+ type: STS
607
+ dataset:
608
+ type: mteb/sts14-sts
609
+ name: MTEB STS14
610
+ config: default
611
+ split: test
612
+ revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375
613
+ metrics:
614
+ - type: cos_sim_pearson
615
+ value: 83.74140104895864
616
+ - type: cos_sim_spearman
617
+ value: 79.74024708732993
618
+ - type: euclidean_pearson
619
+ value: 82.50081856448949
620
+ - type: euclidean_spearman
621
+ value: 79.74024708732993
622
+ - type: manhattan_pearson
623
+ value: 82.36588991657912
624
+ - type: manhattan_spearman
625
+ value: 79.59022658604357
626
+ - task:
627
+ type: STS
628
+ dataset:
629
+ type: mteb/sts15-sts
630
+ name: MTEB STS15
631
+ config: default
632
+ split: test
633
+ revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3
634
+ metrics:
635
+ - type: cos_sim_pearson
636
+ value: 86.30124436614311
637
+ - type: cos_sim_spearman
638
+ value: 86.97688974734349
639
+ - type: euclidean_pearson
640
+ value: 86.36868875097032
641
+ - type: euclidean_spearman
642
+ value: 86.97688974734349
643
+ - type: manhattan_pearson
644
+ value: 86.37787059133234
645
+ - type: manhattan_spearman
646
+ value: 86.96666693570158
647
+ - task:
648
+ type: STS
649
+ dataset:
650
+ type: mteb/sts16-sts
651
+ name: MTEB STS16
652
+ config: default
653
+ split: test
654
+ revision: 4d8694f8f0e0100860b497b999b3dbed754a0513
655
+ metrics:
656
+ - type: cos_sim_pearson
657
+ value: 83.27590066451398
658
+ - type: cos_sim_spearman
659
+ value: 84.40811627278994
660
+ - type: euclidean_pearson
661
+ value: 83.77341566536141
662
+ - type: euclidean_spearman
663
+ value: 84.40811627278994
664
+ - type: manhattan_pearson
665
+ value: 83.72567664904311
666
+ - type: manhattan_spearman
667
+ value: 84.42172336387632
668
+ - task:
669
+ type: STS
670
+ dataset:
671
+ type: mteb/sts17-crosslingual-sts
672
+ name: MTEB STS17 (en-en)
673
+ config: en-en
674
+ split: test
675
+ revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d
676
+ metrics:
677
+ - type: cos_sim_pearson
678
+ value: 89.13791942173916
679
+ - type: cos_sim_spearman
680
+ value: 89.22016928873572
681
+ - type: euclidean_pearson
682
+ value: 89.43583792557924
683
+ - type: euclidean_spearman
684
+ value: 89.22016928873572
685
+ - type: manhattan_pearson
686
+ value: 89.47307915863284
687
+ - type: manhattan_spearman
688
+ value: 89.20752264220539
689
+ - task:
690
+ type: STS
691
+ dataset:
692
+ type: mteb/sts22-crosslingual-sts
693
+ name: MTEB STS22 (en)
694
+ config: en
695
+ split: test
696
+ revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80
697
+ metrics:
698
+ - type: cos_sim_pearson
699
+ value: 64.92003328655028
700
+ - type: cos_sim_spearman
701
+ value: 65.42027229611072
702
+ - type: euclidean_pearson
703
+ value: 66.68765284942059
704
+ - type: euclidean_spearman
705
+ value: 65.42027229611072
706
+ - type: manhattan_pearson
707
+ value: 66.85383496796447
708
+ - type: manhattan_spearman
709
+ value: 65.53490117706689
710
+ - task:
711
+ type: STS
712
+ dataset:
713
+ type: mteb/stsbenchmark-sts
714
+ name: MTEB STSBenchmark
715
+ config: default
716
+ split: test
717
+ revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831
718
+ metrics:
719
+ - type: cos_sim_pearson
720
+ value: 85.97445894753297
721
+ - type: cos_sim_spearman
722
+ value: 86.57651994952795
723
+ - type: euclidean_pearson
724
+ value: 86.7061296897819
725
+ - type: euclidean_spearman
726
+ value: 86.57651994952795
727
+ - type: manhattan_pearson
728
+ value: 86.66411668551642
729
+ - type: manhattan_spearman
730
+ value: 86.53200653755397
731
+ - task:
732
+ type: Reranking
733
+ dataset:
734
+ type: mteb/scidocs-reranking
735
+ name: MTEB SciDocsRR
736
+ config: default
737
+ split: test
738
+ revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab
739
+ metrics:
740
+ - type: map
741
+ value: 81.62235389081138
742
+ - type: mrr
743
+ value: 94.65811965811966
744
+ - task:
745
+ type: Retrieval
746
+ dataset:
747
+ type: scifact
748
+ name: MTEB SciFact
749
+ config: default
750
+ split: test
751
+ revision: None
752
+ metrics:
753
+ - type: ndcg_at_10
754
+ value: 66.687
755
+ - task:
756
+ type: PairClassification
757
+ dataset:
758
+ type: mteb/sprintduplicatequestions-pairclassification
759
+ name: MTEB SprintDuplicateQuestions
760
+ config: default
761
+ split: test
762
+ revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46
763
+ metrics:
764
+ - type: cos_sim_accuracy
765
+ value: 99.86435643564356
766
+ - type: cos_sim_ap
767
+ value: 96.59150882873165
768
+ - type: cos_sim_f1
769
+ value: 93.07030854830552
770
+ - type: cos_sim_precision
771
+ value: 94.16581371545547
772
+ - type: cos_sim_recall
773
+ value: 92.0
774
+ - type: dot_accuracy
775
+ value: 99.86435643564356
776
+ - type: dot_ap
777
+ value: 96.59150882873165
778
+ - type: dot_f1
779
+ value: 93.07030854830552
780
+ - type: dot_precision
781
+ value: 94.16581371545547
782
+ - type: dot_recall
783
+ value: 92.0
784
+ - type: euclidean_accuracy
785
+ value: 99.86435643564356
786
+ - type: euclidean_ap
787
+ value: 96.59150882873162
788
+ - type: euclidean_f1
789
+ value: 93.07030854830552
790
+ - type: euclidean_precision
791
+ value: 94.16581371545547
792
+ - type: euclidean_recall
793
+ value: 92.0
794
+ - type: manhattan_accuracy
795
+ value: 99.86336633663366
796
+ - type: manhattan_ap
797
+ value: 96.58123246795022
798
+ - type: manhattan_f1
799
+ value: 92.9591836734694
800
+ - type: manhattan_precision
801
+ value: 94.89583333333333
802
+ - type: manhattan_recall
803
+ value: 91.10000000000001
804
+ - type: max_accuracy
805
+ value: 99.86435643564356
806
+ - type: max_ap
807
+ value: 96.59150882873165
808
+ - type: max_f1
809
+ value: 93.07030854830552
810
+ - task:
811
+ type: Clustering
812
+ dataset:
813
+ type: mteb/stackexchange-clustering
814
+ name: MTEB StackExchangeClustering
815
+ config: default
816
+ split: test
817
+ revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259
818
+ metrics:
819
+ - type: v_measure
820
+ value: 62.938055854344455
821
+ - task:
822
+ type: Clustering
823
+ dataset:
824
+ type: mteb/stackexchange-clustering-p2p
825
+ name: MTEB StackExchangeClusteringP2P
826
+ config: default
827
+ split: test
828
+ revision: 815ca46b2622cec33ccafc3735d572c266efdb44
829
+ metrics:
830
+ - type: v_measure
831
+ value: 36.479716154538224
832
+ - task:
833
+ type: Reranking
834
+ dataset:
835
+ type: mteb/stackoverflowdupquestions-reranking
836
+ name: MTEB StackOverflowDupQuestions
837
+ config: default
838
+ split: test
839
+ revision: e185fbe320c72810689fc5848eb6114e1ef5ec69
840
+ metrics:
841
+ - type: map
842
+ value: 50.75827388766867
843
+ - type: mrr
844
+ value: 51.65291305916306
845
+ - task:
846
+ type: Summarization
847
+ dataset:
848
+ type: mteb/summeval
849
+ name: MTEB SummEval
850
+ config: default
851
+ split: test
852
+ revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c
853
+ metrics:
854
+ - type: cos_sim_pearson
855
+ value: 31.81419421090782
856
+ - type: cos_sim_spearman
857
+ value: 31.287464634068492
858
+ - type: dot_pearson
859
+ value: 31.814195589790177
860
+ - type: dot_spearman
861
+ value: 31.287464634068492
862
+ - task:
863
+ type: Retrieval
864
+ dataset:
865
+ type: trec-covid
866
+ name: MTEB TRECCOVID
867
+ config: default
868
+ split: test
869
+ revision: None
870
+ metrics:
871
+ - type: ndcg_at_10
872
+ value: 79.364
873
+ - task:
874
+ type: Retrieval
875
+ dataset:
876
+ type: webis-touche2020
877
+ name: MTEB Touche2020
878
+ config: default
879
+ split: test
880
+ revision: None
881
+ metrics:
882
+ - type: ndcg_at_10
883
+ value: 31.927
884
+ - task:
885
+ type: Classification
886
+ dataset:
887
+ type: mteb/toxic_conversations_50k
888
+ name: MTEB ToxicConversationsClassification
889
+ config: default
890
+ split: test
891
+ revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c
892
+ metrics:
893
+ - type: accuracy
894
+ value: 73.0414
895
+ - type: ap
896
+ value: 16.06723077348852
897
+ - type: f1
898
+ value: 56.73470421774399
899
+ - task:
900
+ type: Classification
901
+ dataset:
902
+ type: mteb/tweet_sentiment_extraction
903
+ name: MTEB TweetSentimentExtractionClassification
904
+ config: default
905
+ split: test
906
+ revision: d604517c81ca91fe16a244d1248fc021f9ecee7a
907
+ metrics:
908
+ - type: accuracy
909
+ value: 64.72269383135257
910
+ - type: f1
911
+ value: 64.70143593421479
912
+ - task:
913
+ type: Clustering
914
+ dataset:
915
+ type: mteb/twentynewsgroups-clustering
916
+ name: MTEB TwentyNewsgroupsClustering
917
+ config: default
918
+ split: test
919
+ revision: 6125ec4e24fa026cec8a478383ee943acfbd5449
920
+ metrics:
921
+ - type: v_measure
922
+ value: 46.06343037695152
923
+ - task:
924
+ type: PairClassification
925
+ dataset:
926
+ type: mteb/twittersemeval2015-pairclassification
927
+ name: MTEB TwitterSemEval2015
928
+ config: default
929
+ split: test
930
+ revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1
931
+ metrics:
932
+ - type: cos_sim_accuracy
933
+ value: 85.59337187816654
934
+ - type: cos_sim_ap
935
+ value: 72.23331527941706
936
+ - type: cos_sim_f1
937
+ value: 67.22915138175593
938
+ - type: cos_sim_precision
939
+ value: 62.64813126709207
940
+ - type: cos_sim_recall
941
+ value: 72.53298153034301
942
+ - type: dot_accuracy
943
+ value: 85.59337187816654
944
+ - type: dot_ap
945
+ value: 72.23332517262921
946
+ - type: dot_f1
947
+ value: 67.22915138175593
948
+ - type: dot_precision
949
+ value: 62.64813126709207
950
+ - type: dot_recall
951
+ value: 72.53298153034301
952
+ - type: euclidean_accuracy
953
+ value: 85.59337187816654
954
+ - type: euclidean_ap
955
+ value: 72.23331029091486
956
+ - type: euclidean_f1
957
+ value: 67.22915138175593
958
+ - type: euclidean_precision
959
+ value: 62.64813126709207
960
+ - type: euclidean_recall
961
+ value: 72.53298153034301
962
+ - type: manhattan_accuracy
963
+ value: 85.4622399713894
964
+ - type: manhattan_ap
965
+ value: 72.05180729774357
966
+ - type: manhattan_f1
967
+ value: 67.12683347713546
968
+ - type: manhattan_precision
969
+ value: 62.98866527874162
970
+ - type: manhattan_recall
971
+ value: 71.84696569920844
972
+ - type: max_accuracy
973
+ value: 85.59337187816654
974
+ - type: max_ap
975
+ value: 72.23332517262921
976
+ - type: max_f1
977
+ value: 67.22915138175593
978
+ - task:
979
+ type: PairClassification
980
+ dataset:
981
+ type: mteb/twitterurlcorpus-pairclassification
982
+ name: MTEB TwitterURLCorpus
983
+ config: default
984
+ split: test
985
+ revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf
986
+ metrics:
987
+ - type: cos_sim_accuracy
988
+ value: 89.08681647067955
989
+ - type: cos_sim_ap
990
+ value: 86.31913876322757
991
+ - type: cos_sim_f1
992
+ value: 78.678007640741
993
+ - type: cos_sim_precision
994
+ value: 73.95988616343678
995
+ - type: cos_sim_recall
996
+ value: 84.03911302740991
997
+ - type: dot_accuracy
998
+ value: 89.08681647067955
999
+ - type: dot_ap
1000
+ value: 86.31913976395484
1001
+ - type: dot_f1
1002
+ value: 78.678007640741
1003
+ - type: dot_precision
1004
+ value: 73.95988616343678
1005
+ - type: dot_recall
1006
+ value: 84.03911302740991
1007
+ - type: euclidean_accuracy
1008
+ value: 89.08681647067955
1009
+ - type: euclidean_ap
1010
+ value: 86.31913869004254
1011
+ - type: euclidean_f1
1012
+ value: 78.678007640741
1013
+ - type: euclidean_precision
1014
+ value: 73.95988616343678
1015
+ - type: euclidean_recall
1016
+ value: 84.03911302740991
1017
+ - type: manhattan_accuracy
1018
+ value: 89.06547133930997
1019
+ - type: manhattan_ap
1020
+ value: 86.24122868846949
1021
+ - type: manhattan_f1
1022
+ value: 78.74963094183643
1023
+ - type: manhattan_precision
1024
+ value: 75.62375956903884
1025
+ - type: manhattan_recall
1026
+ value: 82.14505697566985
1027
+ - type: max_accuracy
1028
+ value: 89.08681647067955
1029
+ - type: max_ap
1030
+ value: 86.31913976395484
1031
+ - type: max_f1
1032
+ value: 78.74963094183643
1033
+ ---
1034
+
1035
+
1036
+ # Cohere embed-english-light-v3.0
1037
+
1038
+ This repository contains the tokenizer for the Cohere `embed-english-light-v3.0` model. See our blogpost [Cohere Embed V3](https://txt.cohere.com/introducing-embed-v3/) for more details on this model.
1039
+
1040
+ You can use the embedding model either via the Cohere API, AWS SageMaker or in your private deployments.
1041
+
1042
+ ## Usage Cohere API
1043
+
1044
+ The following code snippet shows the usage of the Cohere API. Install the cohere SDK via:
1045
+ ```
1046
+ pip install -U cohere
1047
+ ```
1048
+
1049
+ Get your free API key on: www.cohere.com
1050
+
1051
+
1052
+ ```python
1053
+ # This snippet shows and example how to use the Cohere Embed V3 models for semantic search.
1054
+ # Make sure to have the Cohere SDK in at least v4.30 install: pip install -U cohere
1055
+ # Get your API key from: www.cohere.com
1056
+ import cohere
1057
+ import numpy as np
1058
+
1059
+ cohere_key = "{YOUR_COHERE_API_KEY}" #Get your API key from www.cohere.com
1060
+ co = cohere.Client(cohere_key)
1061
+
1062
+ docs = ["The capital of France is Paris",
1063
+ "PyTorch is a machine learning framework based on the Torch library.",
1064
+ "The average cat lifespan is between 13-17 years"]
1065
+
1066
+
1067
+ #Encode your documents with input type 'search_document'
1068
+ doc_emb = co.embed(docs, input_type="search_document", model="embed-english-light-v3.0").embeddings
1069
+ doc_emb = np.asarray(doc_emb)
1070
+
1071
+
1072
+ #Encode your query with input type 'search_query'
1073
+ query = "What is Pytorch"
1074
+ query_emb = co.embed([query], input_type="search_query", model="embed-english-light-v3.0").embeddings
1075
+ query_emb = np.asarray(query_emb)
1076
+ query_emb.shape
1077
+
1078
+ #Compute the dot product between query embedding and document embedding
1079
+ scores = np.dot(query_emb, doc_emb.T)[0]
1080
+
1081
+ #Find the highest scores
1082
+ max_idx = np.argsort(-scores)
1083
+
1084
+ print(f"Query: {query}")
1085
+ for idx in max_idx:
1086
+ print(f"Score: {scores[idx]:.2f}")
1087
+ print(docs[idx])
1088
+ print("--------")
1089
+ ```
1090
+
1091
+ ## Usage AWS SageMaker
1092
+ The embedding model can be privately deployed in your AWS Cloud using our [AWS SageMaker marketplace offering](https://aws.amazon.com/marketplace/pp/prodview-z6huxszcqc25i). It runs privately in your VPC, with latencies as low as 5ms for query encoding.
1093
+
1094
+ ## Usage AWS Bedrock
1095
+ Soon the model will also be available via AWS Bedrock. Stay tuned
1096
+
1097
+ ## Private Deployment
1098
+ You want to run the model on your own hardware? [Contact Sales](https://cohere.com/contact-sales) to learn more.
1099
+
1100
+ ## Supported Languages
1101
+ This model was trained on nearly 1B English training pairs.
1102
+
1103
+ Evaluation results can be found in the [Embed V3.0 Benchmark Results spreadsheet](https://docs.google.com/spreadsheets/d/1w7gnHWMDBdEUrmHgSfDnGHJgVQE5aOiXCCwO3uNH_mI/edit?usp=sharing).
added_tokens.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "[COHERE_CLUSTERING_ID]": 30524,
3
+ "[COHERE_SEARCH_DOCUMENT_ID]": 30522,
4
+ "[COHERE_SEARCH_QUERY_ID]": 30523
5
+ }
config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "n_positions": 512,
3
+ "hidden_dim": 384
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "clean_up_tokenization_spaces": true,
3
+ "cls_token": "[CLS]",
4
+ "do_basic_tokenize": true,
5
+ "do_lower_case": true,
6
+ "mask_token": "[MASK]",
7
+ "max_length": null,
8
+ "model_max_length": 512,
9
+ "never_split": null,
10
+ "pad_to_multiple_of": null,
11
+ "pad_token": "[PAD]",
12
+ "pad_token_type_id": 0,
13
+ "padding_side": "right",
14
+ "sep_token": "[SEP]",
15
+ "strip_accents": null,
16
+ "tokenize_chinese_chars": true,
17
+ "tokenizer_class": "BertTokenizer",
18
+ "unk_token": "[UNK]"
19
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff