nreimers commited on
Commit
a73b099
1 Parent(s): 9048686
README.md ADDED
@@ -0,0 +1,1103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - mteb
4
+ model-index:
5
+ - name: embed-english-v3.0
6
+ results:
7
+ - task:
8
+ type: Classification
9
+ dataset:
10
+ type: mteb/amazon_counterfactual
11
+ name: MTEB AmazonCounterfactualClassification (en)
12
+ config: en
13
+ split: test
14
+ revision: e8379541af4e31359cca9fbcf4b00f2671dba205
15
+ metrics:
16
+ - type: accuracy
17
+ value: 81.29850746268656
18
+ - type: ap
19
+ value: 46.181772245676136
20
+ - type: f1
21
+ value: 75.47731234579823
22
+ - task:
23
+ type: Classification
24
+ dataset:
25
+ type: mteb/amazon_polarity
26
+ name: MTEB AmazonPolarityClassification
27
+ config: default
28
+ split: test
29
+ revision: e2d317d38cd51312af73b3d32a06d1a08b442046
30
+ metrics:
31
+ - type: accuracy
32
+ value: 95.61824999999999
33
+ - type: ap
34
+ value: 93.22525741797098
35
+ - type: f1
36
+ value: 95.61627312544859
37
+ - task:
38
+ type: Classification
39
+ dataset:
40
+ type: mteb/amazon_reviews_multi
41
+ name: MTEB AmazonReviewsClassification (en)
42
+ config: en
43
+ split: test
44
+ revision: 1399c76144fd37290681b995c656ef9b2e06e26d
45
+ metrics:
46
+ - type: accuracy
47
+ value: 51.72
48
+ - type: f1
49
+ value: 50.529480725642465
50
+ - task:
51
+ type: Retrieval
52
+ dataset:
53
+ type: arguana
54
+ name: MTEB ArguAna
55
+ config: default
56
+ split: test
57
+ revision: None
58
+ metrics:
59
+ - type: ndcg_at_10
60
+ value: 61.521
61
+ - task:
62
+ type: Clustering
63
+ dataset:
64
+ type: mteb/arxiv-clustering-p2p
65
+ name: MTEB ArxivClusteringP2P
66
+ config: default
67
+ split: test
68
+ revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d
69
+ metrics:
70
+ - type: v_measure
71
+ value: 49.173332266218914
72
+ - task:
73
+ type: Clustering
74
+ dataset:
75
+ type: mteb/arxiv-clustering-s2s
76
+ name: MTEB ArxivClusteringS2S
77
+ config: default
78
+ split: test
79
+ revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53
80
+ metrics:
81
+ - type: v_measure
82
+ value: 42.1800504937582
83
+ - task:
84
+ type: Reranking
85
+ dataset:
86
+ type: mteb/askubuntudupquestions-reranking
87
+ name: MTEB AskUbuntuDupQuestions
88
+ config: default
89
+ split: test
90
+ revision: 2000358ca161889fa9c082cb41daa8dcfb161a54
91
+ metrics:
92
+ - type: map
93
+ value: 61.69942465283367
94
+ - type: mrr
95
+ value: 73.8089741898606
96
+ - task:
97
+ type: STS
98
+ dataset:
99
+ type: mteb/biosses-sts
100
+ name: MTEB BIOSSES
101
+ config: default
102
+ split: test
103
+ revision: d3fb88f8f02e40887cd149695127462bbcf29b4a
104
+ metrics:
105
+ - type: cos_sim_pearson
106
+ value: 85.1805709775319
107
+ - type: cos_sim_spearman
108
+ value: 83.50310749422796
109
+ - type: euclidean_pearson
110
+ value: 83.57134970408762
111
+ - type: euclidean_spearman
112
+ value: 83.50310749422796
113
+ - type: manhattan_pearson
114
+ value: 83.422472116232
115
+ - type: manhattan_spearman
116
+ value: 83.35611619312422
117
+ - task:
118
+ type: Classification
119
+ dataset:
120
+ type: mteb/banking77
121
+ name: MTEB Banking77Classification
122
+ config: default
123
+ split: test
124
+ revision: 0fd18e25b25c072e09e0d92ab615fda904d66300
125
+ metrics:
126
+ - type: accuracy
127
+ value: 85.52922077922078
128
+ - type: f1
129
+ value: 85.48530911742581
130
+ - task:
131
+ type: Clustering
132
+ dataset:
133
+ type: mteb/biorxiv-clustering-p2p
134
+ name: MTEB BiorxivClusteringP2P
135
+ config: default
136
+ split: test
137
+ revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40
138
+ metrics:
139
+ - type: v_measure
140
+ value: 40.95750155360001
141
+ - task:
142
+ type: Clustering
143
+ dataset:
144
+ type: mteb/biorxiv-clustering-s2s
145
+ name: MTEB BiorxivClusteringS2S
146
+ config: default
147
+ split: test
148
+ revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908
149
+ metrics:
150
+ - type: v_measure
151
+ value: 37.25334765305169
152
+ - task:
153
+ type: Retrieval
154
+ dataset:
155
+ type: BeIR/cqadupstack
156
+ name: MTEB CQADupstackAndroidRetrieval
157
+ config: default
158
+ split: test
159
+ revision: None
160
+ metrics:
161
+ - type: ndcg_at_10
162
+ value: 50.037
163
+ - task:
164
+ type: Retrieval
165
+ dataset:
166
+ type: BeIR/cqadupstack
167
+ name: MTEB CQADupstackEnglishRetrieval
168
+ config: default
169
+ split: test
170
+ revision: None
171
+ metrics:
172
+ - type: ndcg_at_10
173
+ value: 49.089
174
+ - task:
175
+ type: Retrieval
176
+ dataset:
177
+ type: BeIR/cqadupstack
178
+ name: MTEB CQADupstackGamingRetrieval
179
+ config: default
180
+ split: test
181
+ revision: None
182
+ metrics:
183
+ - type: ndcg_at_10
184
+ value: 60.523
185
+ - task:
186
+ type: Retrieval
187
+ dataset:
188
+ type: BeIR/cqadupstack
189
+ name: MTEB CQADupstackGisRetrieval
190
+ config: default
191
+ split: test
192
+ revision: None
193
+ metrics:
194
+ - type: ndcg_at_10
195
+ value: 39.293
196
+ - task:
197
+ type: Retrieval
198
+ dataset:
199
+ type: BeIR/cqadupstack
200
+ name: MTEB CQADupstackMathematicaRetrieval
201
+ config: default
202
+ split: test
203
+ revision: None
204
+ metrics:
205
+ - type: ndcg_at_10
206
+ value: 30.414
207
+ - task:
208
+ type: Retrieval
209
+ dataset:
210
+ type: BeIR/cqadupstack
211
+ name: MTEB CQADupstackPhysicsRetrieval
212
+ config: default
213
+ split: test
214
+ revision: None
215
+ metrics:
216
+ - type: ndcg_at_10
217
+ value: 43.662
218
+ - task:
219
+ type: Retrieval
220
+ dataset:
221
+ type: BeIR/cqadupstack
222
+ name: MTEB CQADupstackProgrammersRetrieval
223
+ config: default
224
+ split: test
225
+ revision: None
226
+ metrics:
227
+ - type: ndcg_at_10
228
+ value: 43.667
229
+ - task:
230
+ type: Retrieval
231
+ dataset:
232
+ type: BeIR/cqadupstack
233
+ name: MTEB CQADupstackRetrieval
234
+ config: default
235
+ split: test
236
+ revision: None
237
+ metrics:
238
+ - type: ndcg_at_10
239
+ value: 41.53158333333334
240
+ - task:
241
+ type: Retrieval
242
+ dataset:
243
+ type: BeIR/cqadupstack
244
+ name: MTEB CQADupstackStatsRetrieval
245
+ config: default
246
+ split: test
247
+ revision: None
248
+ metrics:
249
+ - type: ndcg_at_10
250
+ value: 35.258
251
+ - task:
252
+ type: Retrieval
253
+ dataset:
254
+ type: BeIR/cqadupstack
255
+ name: MTEB CQADupstackTexRetrieval
256
+ config: default
257
+ split: test
258
+ revision: None
259
+ metrics:
260
+ - type: ndcg_at_10
261
+ value: 30.866
262
+ - task:
263
+ type: Retrieval
264
+ dataset:
265
+ type: BeIR/cqadupstack
266
+ name: MTEB CQADupstackUnixRetrieval
267
+ config: default
268
+ split: test
269
+ revision: None
270
+ metrics:
271
+ - type: ndcg_at_10
272
+ value: 40.643
273
+ - task:
274
+ type: Retrieval
275
+ dataset:
276
+ type: BeIR/cqadupstack
277
+ name: MTEB CQADupstackWebmastersRetrieval
278
+ config: default
279
+ split: test
280
+ revision: None
281
+ metrics:
282
+ - type: ndcg_at_10
283
+ value: 40.663
284
+ - task:
285
+ type: Retrieval
286
+ dataset:
287
+ type: BeIR/cqadupstack
288
+ name: MTEB CQADupstackWordpressRetrieval
289
+ config: default
290
+ split: test
291
+ revision: None
292
+ metrics:
293
+ - type: ndcg_at_10
294
+ value: 34.264
295
+ - task:
296
+ type: Retrieval
297
+ dataset:
298
+ type: climate-fever
299
+ name: MTEB ClimateFEVER
300
+ config: default
301
+ split: test
302
+ revision: None
303
+ metrics:
304
+ - type: ndcg_at_10
305
+ value: 38.433
306
+ - task:
307
+ type: Retrieval
308
+ dataset:
309
+ type: dbpedia-entity
310
+ name: MTEB DBPedia
311
+ config: default
312
+ split: test
313
+ revision: None
314
+ metrics:
315
+ - type: ndcg_at_10
316
+ value: 43.36
317
+ - task:
318
+ type: Classification
319
+ dataset:
320
+ type: mteb/emotion
321
+ name: MTEB EmotionClassification
322
+ config: default
323
+ split: test
324
+ revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37
325
+ metrics:
326
+ - type: accuracy
327
+ value: 51.574999999999996
328
+ - type: f1
329
+ value: 46.84362123583929
330
+ - task:
331
+ type: Retrieval
332
+ dataset:
333
+ type: fever
334
+ name: MTEB FEVER
335
+ config: default
336
+ split: test
337
+ revision: None
338
+ metrics:
339
+ - type: ndcg_at_10
340
+ value: 88.966
341
+ - task:
342
+ type: Retrieval
343
+ dataset:
344
+ type: fiqa
345
+ name: MTEB FiQA2018
346
+ config: default
347
+ split: test
348
+ revision: None
349
+ metrics:
350
+ - type: ndcg_at_10
351
+ value: 42.189
352
+ - task:
353
+ type: Retrieval
354
+ dataset:
355
+ type: hotpotqa
356
+ name: MTEB HotpotQA
357
+ config: default
358
+ split: test
359
+ revision: None
360
+ metrics:
361
+ - type: ndcg_at_10
362
+ value: 70.723
363
+ - task:
364
+ type: Classification
365
+ dataset:
366
+ type: mteb/imdb
367
+ name: MTEB ImdbClassification
368
+ config: default
369
+ split: test
370
+ revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7
371
+ metrics:
372
+ - type: accuracy
373
+ value: 93.56920000000001
374
+ - type: ap
375
+ value: 90.56104192134326
376
+ - type: f1
377
+ value: 93.56471146876505
378
+ - task:
379
+ type: Retrieval
380
+ dataset:
381
+ type: msmarco
382
+ name: MTEB MSMARCO
383
+ config: default
384
+ split: test
385
+ revision: None
386
+ metrics:
387
+ - type: ndcg_at_10
388
+ value: 42.931000000000004
389
+ - task:
390
+ type: Classification
391
+ dataset:
392
+ type: mteb/mtop_domain
393
+ name: MTEB MTOPDomainClassification (en)
394
+ config: en
395
+ split: test
396
+ revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
397
+ metrics:
398
+ - type: accuracy
399
+ value: 94.88372093023256
400
+ - type: f1
401
+ value: 94.64417024711646
402
+ - task:
403
+ type: Classification
404
+ dataset:
405
+ type: mteb/mtop_intent
406
+ name: MTEB MTOPIntentClassification (en)
407
+ config: en
408
+ split: test
409
+ revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
410
+ metrics:
411
+ - type: accuracy
412
+ value: 76.52302781577748
413
+ - type: f1
414
+ value: 59.52848723786157
415
+ - task:
416
+ type: Classification
417
+ dataset:
418
+ type: mteb/amazon_massive_intent
419
+ name: MTEB MassiveIntentClassification (en)
420
+ config: en
421
+ split: test
422
+ revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
423
+ metrics:
424
+ - type: accuracy
425
+ value: 73.84330867518494
426
+ - type: f1
427
+ value: 72.18121296285702
428
+ - task:
429
+ type: Classification
430
+ dataset:
431
+ type: mteb/amazon_massive_scenario
432
+ name: MTEB MassiveScenarioClassification (en)
433
+ config: en
434
+ split: test
435
+ revision: 7d571f92784cd94a019292a1f45445077d0ef634
436
+ metrics:
437
+ - type: accuracy
438
+ value: 78.73907195696033
439
+ - type: f1
440
+ value: 78.86079300338558
441
+ - task:
442
+ type: Clustering
443
+ dataset:
444
+ type: mteb/medrxiv-clustering-p2p
445
+ name: MTEB MedrxivClusteringP2P
446
+ config: default
447
+ split: test
448
+ revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73
449
+ metrics:
450
+ - type: v_measure
451
+ value: 37.40673427491627
452
+ - task:
453
+ type: Clustering
454
+ dataset:
455
+ type: mteb/medrxiv-clustering-s2s
456
+ name: MTEB MedrxivClusteringS2S
457
+ config: default
458
+ split: test
459
+ revision: 35191c8c0dca72d8ff3efcd72aa802307d469663
460
+ metrics:
461
+ - type: v_measure
462
+ value: 33.38936252583581
463
+ - task:
464
+ type: Reranking
465
+ dataset:
466
+ type: mteb/mind_small
467
+ name: MTEB MindSmallReranking
468
+ config: default
469
+ split: test
470
+ revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69
471
+ metrics:
472
+ - type: map
473
+ value: 32.67317850167471
474
+ - type: mrr
475
+ value: 33.9334102169254
476
+ - task:
477
+ type: Retrieval
478
+ dataset:
479
+ type: nfcorpus
480
+ name: MTEB NFCorpus
481
+ config: default
482
+ split: test
483
+ revision: None
484
+ metrics:
485
+ - type: ndcg_at_10
486
+ value: 38.574000000000005
487
+ - task:
488
+ type: Retrieval
489
+ dataset:
490
+ type: nq
491
+ name: MTEB NQ
492
+ config: default
493
+ split: test
494
+ revision: None
495
+ metrics:
496
+ - type: ndcg_at_10
497
+ value: 61.556
498
+ - task:
499
+ type: Retrieval
500
+ dataset:
501
+ type: quora
502
+ name: MTEB QuoraRetrieval
503
+ config: default
504
+ split: test
505
+ revision: None
506
+ metrics:
507
+ - type: ndcg_at_10
508
+ value: 88.722
509
+ - task:
510
+ type: Clustering
511
+ dataset:
512
+ type: mteb/reddit-clustering
513
+ name: MTEB RedditClustering
514
+ config: default
515
+ split: test
516
+ revision: 24640382cdbf8abc73003fb0fa6d111a705499eb
517
+ metrics:
518
+ - type: v_measure
519
+ value: 58.45790556534654
520
+ - task:
521
+ type: Clustering
522
+ dataset:
523
+ type: mteb/reddit-clustering-p2p
524
+ name: MTEB RedditClusteringP2P
525
+ config: default
526
+ split: test
527
+ revision: 282350215ef01743dc01b456c7f5241fa8937f16
528
+ metrics:
529
+ - type: v_measure
530
+ value: 66.35141658656822
531
+ - task:
532
+ type: Retrieval
533
+ dataset:
534
+ type: scidocs
535
+ name: MTEB SCIDOCS
536
+ config: default
537
+ split: test
538
+ revision: None
539
+ metrics:
540
+ - type: ndcg_at_10
541
+ value: 20.314
542
+ - task:
543
+ type: STS
544
+ dataset:
545
+ type: mteb/sickr-sts
546
+ name: MTEB SICK-R
547
+ config: default
548
+ split: test
549
+ revision: a6ea5a8cab320b040a23452cc28066d9beae2cee
550
+ metrics:
551
+ - type: cos_sim_pearson
552
+ value: 85.49945063881191
553
+ - type: cos_sim_spearman
554
+ value: 81.27177640994141
555
+ - type: euclidean_pearson
556
+ value: 82.74613694646263
557
+ - type: euclidean_spearman
558
+ value: 81.2717795980493
559
+ - type: manhattan_pearson
560
+ value: 82.75268512220467
561
+ - type: manhattan_spearman
562
+ value: 81.28362006796547
563
+ - task:
564
+ type: STS
565
+ dataset:
566
+ type: mteb/sts12-sts
567
+ name: MTEB STS12
568
+ config: default
569
+ split: test
570
+ revision: a0d554a64d88156834ff5ae9920b964011b16384
571
+ metrics:
572
+ - type: cos_sim_pearson
573
+ value: 83.17562591888526
574
+ - type: cos_sim_spearman
575
+ value: 74.37099514810372
576
+ - type: euclidean_pearson
577
+ value: 79.97392043583372
578
+ - type: euclidean_spearman
579
+ value: 74.37103618585903
580
+ - type: manhattan_pearson
581
+ value: 80.00641585184354
582
+ - type: manhattan_spearman
583
+ value: 74.35403985608939
584
+ - task:
585
+ type: STS
586
+ dataset:
587
+ type: mteb/sts13-sts
588
+ name: MTEB STS13
589
+ config: default
590
+ split: test
591
+ revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca
592
+ metrics:
593
+ - type: cos_sim_pearson
594
+ value: 84.96937598668538
595
+ - type: cos_sim_spearman
596
+ value: 85.20181466598035
597
+ - type: euclidean_pearson
598
+ value: 84.51715977112744
599
+ - type: euclidean_spearman
600
+ value: 85.20181466598035
601
+ - type: manhattan_pearson
602
+ value: 84.45150037846719
603
+ - type: manhattan_spearman
604
+ value: 85.12338939049123
605
+ - task:
606
+ type: STS
607
+ dataset:
608
+ type: mteb/sts14-sts
609
+ name: MTEB STS14
610
+ config: default
611
+ split: test
612
+ revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375
613
+ metrics:
614
+ - type: cos_sim_pearson
615
+ value: 84.58787775650663
616
+ - type: cos_sim_spearman
617
+ value: 80.97859876561874
618
+ - type: euclidean_pearson
619
+ value: 83.38711461294801
620
+ - type: euclidean_spearman
621
+ value: 80.97859876561874
622
+ - type: manhattan_pearson
623
+ value: 83.34934127987394
624
+ - type: manhattan_spearman
625
+ value: 80.9556224835537
626
+ - task:
627
+ type: STS
628
+ dataset:
629
+ type: mteb/sts15-sts
630
+ name: MTEB STS15
631
+ config: default
632
+ split: test
633
+ revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3
634
+ metrics:
635
+ - type: cos_sim_pearson
636
+ value: 88.57387982528677
637
+ - type: cos_sim_spearman
638
+ value: 89.22666720704161
639
+ - type: euclidean_pearson
640
+ value: 88.50953296228646
641
+ - type: euclidean_spearman
642
+ value: 89.22666720704161
643
+ - type: manhattan_pearson
644
+ value: 88.45343635855095
645
+ - type: manhattan_spearman
646
+ value: 89.1638631562071
647
+ - task:
648
+ type: STS
649
+ dataset:
650
+ type: mteb/sts16-sts
651
+ name: MTEB STS16
652
+ config: default
653
+ split: test
654
+ revision: 4d8694f8f0e0100860b497b999b3dbed754a0513
655
+ metrics:
656
+ - type: cos_sim_pearson
657
+ value: 85.26071496425682
658
+ - type: cos_sim_spearman
659
+ value: 86.31740966379304
660
+ - type: euclidean_pearson
661
+ value: 85.85515938268887
662
+ - type: euclidean_spearman
663
+ value: 86.31740966379304
664
+ - type: manhattan_pearson
665
+ value: 85.80077191882177
666
+ - type: manhattan_spearman
667
+ value: 86.27885602957302
668
+ - task:
669
+ type: STS
670
+ dataset:
671
+ type: mteb/sts17-crosslingual-sts
672
+ name: MTEB STS17 (en-en)
673
+ config: en-en
674
+ split: test
675
+ revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d
676
+ metrics:
677
+ - type: cos_sim_pearson
678
+ value: 90.41413251495673
679
+ - type: cos_sim_spearman
680
+ value: 90.3370719075361
681
+ - type: euclidean_pearson
682
+ value: 90.5785973346113
683
+ - type: euclidean_spearman
684
+ value: 90.3370719075361
685
+ - type: manhattan_pearson
686
+ value: 90.5278703024898
687
+ - type: manhattan_spearman
688
+ value: 90.23870483011629
689
+ - task:
690
+ type: STS
691
+ dataset:
692
+ type: mteb/sts22-crosslingual-sts
693
+ name: MTEB STS22 (en)
694
+ config: en
695
+ split: test
696
+ revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80
697
+ metrics:
698
+ - type: cos_sim_pearson
699
+ value: 66.1571023517868
700
+ - type: cos_sim_spearman
701
+ value: 66.42297916256133
702
+ - type: euclidean_pearson
703
+ value: 67.55835224919745
704
+ - type: euclidean_spearman
705
+ value: 66.42297916256133
706
+ - type: manhattan_pearson
707
+ value: 67.40537247802385
708
+ - type: manhattan_spearman
709
+ value: 66.26259339863576
710
+ - task:
711
+ type: STS
712
+ dataset:
713
+ type: mteb/stsbenchmark-sts
714
+ name: MTEB STSBenchmark
715
+ config: default
716
+ split: test
717
+ revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831
718
+ metrics:
719
+ - type: cos_sim_pearson
720
+ value: 87.4251695055504
721
+ - type: cos_sim_spearman
722
+ value: 88.54881886307972
723
+ - type: euclidean_pearson
724
+ value: 88.54094330250571
725
+ - type: euclidean_spearman
726
+ value: 88.54881886307972
727
+ - type: manhattan_pearson
728
+ value: 88.49069549839685
729
+ - type: manhattan_spearman
730
+ value: 88.49149164694148
731
+ - task:
732
+ type: Reranking
733
+ dataset:
734
+ type: mteb/scidocs-reranking
735
+ name: MTEB SciDocsRR
736
+ config: default
737
+ split: test
738
+ revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab
739
+ metrics:
740
+ - type: map
741
+ value: 85.19974508901711
742
+ - type: mrr
743
+ value: 95.95137342686361
744
+ - task:
745
+ type: Retrieval
746
+ dataset:
747
+ type: scifact
748
+ name: MTEB SciFact
749
+ config: default
750
+ split: test
751
+ revision: None
752
+ metrics:
753
+ - type: ndcg_at_10
754
+ value: 71.825
755
+ - task:
756
+ type: PairClassification
757
+ dataset:
758
+ type: mteb/sprintduplicatequestions-pairclassification
759
+ name: MTEB SprintDuplicateQuestions
760
+ config: default
761
+ split: test
762
+ revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46
763
+ metrics:
764
+ - type: cos_sim_accuracy
765
+ value: 99.85346534653465
766
+ - type: cos_sim_ap
767
+ value: 96.2457455868878
768
+ - type: cos_sim_f1
769
+ value: 92.49492900608519
770
+ - type: cos_sim_precision
771
+ value: 93.82716049382715
772
+ - type: cos_sim_recall
773
+ value: 91.2
774
+ - type: dot_accuracy
775
+ value: 99.85346534653465
776
+ - type: dot_ap
777
+ value: 96.24574558688776
778
+ - type: dot_f1
779
+ value: 92.49492900608519
780
+ - type: dot_precision
781
+ value: 93.82716049382715
782
+ - type: dot_recall
783
+ value: 91.2
784
+ - type: euclidean_accuracy
785
+ value: 99.85346534653465
786
+ - type: euclidean_ap
787
+ value: 96.2457455868878
788
+ - type: euclidean_f1
789
+ value: 92.49492900608519
790
+ - type: euclidean_precision
791
+ value: 93.82716049382715
792
+ - type: euclidean_recall
793
+ value: 91.2
794
+ - type: manhattan_accuracy
795
+ value: 99.85643564356435
796
+ - type: manhattan_ap
797
+ value: 96.24594126679709
798
+ - type: manhattan_f1
799
+ value: 92.63585576434738
800
+ - type: manhattan_precision
801
+ value: 94.11764705882352
802
+ - type: manhattan_recall
803
+ value: 91.2
804
+ - type: max_accuracy
805
+ value: 99.85643564356435
806
+ - type: max_ap
807
+ value: 96.24594126679709
808
+ - type: max_f1
809
+ value: 92.63585576434738
810
+ - task:
811
+ type: Clustering
812
+ dataset:
813
+ type: mteb/stackexchange-clustering
814
+ name: MTEB StackExchangeClustering
815
+ config: default
816
+ split: test
817
+ revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259
818
+ metrics:
819
+ - type: v_measure
820
+ value: 68.41861859721674
821
+ - task:
822
+ type: Clustering
823
+ dataset:
824
+ type: mteb/stackexchange-clustering-p2p
825
+ name: MTEB StackExchangeClusteringP2P
826
+ config: default
827
+ split: test
828
+ revision: 815ca46b2622cec33ccafc3735d572c266efdb44
829
+ metrics:
830
+ - type: v_measure
831
+ value: 37.51202861563424
832
+ - task:
833
+ type: Reranking
834
+ dataset:
835
+ type: mteb/stackoverflowdupquestions-reranking
836
+ name: MTEB StackOverflowDupQuestions
837
+ config: default
838
+ split: test
839
+ revision: e185fbe320c72810689fc5848eb6114e1ef5ec69
840
+ metrics:
841
+ - type: map
842
+ value: 52.48207537634766
843
+ - type: mrr
844
+ value: 53.36204747050335
845
+ - task:
846
+ type: Summarization
847
+ dataset:
848
+ type: mteb/summeval
849
+ name: MTEB SummEval
850
+ config: default
851
+ split: test
852
+ revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c
853
+ metrics:
854
+ - type: cos_sim_pearson
855
+ value: 30.397150340510397
856
+ - type: cos_sim_spearman
857
+ value: 30.180928192386
858
+ - type: dot_pearson
859
+ value: 30.397148822378796
860
+ - type: dot_spearman
861
+ value: 30.180928192386
862
+ - task:
863
+ type: Retrieval
864
+ dataset:
865
+ type: trec-covid
866
+ name: MTEB TRECCOVID
867
+ config: default
868
+ split: test
869
+ revision: None
870
+ metrics:
871
+ - type: ndcg_at_10
872
+ value: 81.919
873
+ - task:
874
+ type: Retrieval
875
+ dataset:
876
+ type: webis-touche2020
877
+ name: MTEB Touche2020
878
+ config: default
879
+ split: test
880
+ revision: None
881
+ metrics:
882
+ - type: ndcg_at_10
883
+ value: 32.419
884
+ - task:
885
+ type: Classification
886
+ dataset:
887
+ type: mteb/toxic_conversations_50k
888
+ name: MTEB ToxicConversationsClassification
889
+ config: default
890
+ split: test
891
+ revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c
892
+ metrics:
893
+ - type: accuracy
894
+ value: 72.613
895
+ - type: ap
896
+ value: 15.696112954573444
897
+ - type: f1
898
+ value: 56.30148693392767
899
+ - task:
900
+ type: Classification
901
+ dataset:
902
+ type: mteb/tweet_sentiment_extraction
903
+ name: MTEB TweetSentimentExtractionClassification
904
+ config: default
905
+ split: test
906
+ revision: d604517c81ca91fe16a244d1248fc021f9ecee7a
907
+ metrics:
908
+ - type: accuracy
909
+ value: 62.02037351443125
910
+ - type: f1
911
+ value: 62.31189055427593
912
+ - task:
913
+ type: Clustering
914
+ dataset:
915
+ type: mteb/twentynewsgroups-clustering
916
+ name: MTEB TwentyNewsgroupsClustering
917
+ config: default
918
+ split: test
919
+ revision: 6125ec4e24fa026cec8a478383ee943acfbd5449
920
+ metrics:
921
+ - type: v_measure
922
+ value: 50.64186455543417
923
+ - task:
924
+ type: PairClassification
925
+ dataset:
926
+ type: mteb/twittersemeval2015-pairclassification
927
+ name: MTEB TwitterSemEval2015
928
+ config: default
929
+ split: test
930
+ revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1
931
+ metrics:
932
+ - type: cos_sim_accuracy
933
+ value: 86.27883411813792
934
+ - type: cos_sim_ap
935
+ value: 74.80076733774258
936
+ - type: cos_sim_f1
937
+ value: 68.97989210397255
938
+ - type: cos_sim_precision
939
+ value: 64.42968392120935
940
+ - type: cos_sim_recall
941
+ value: 74.22163588390501
942
+ - type: dot_accuracy
943
+ value: 86.27883411813792
944
+ - type: dot_ap
945
+ value: 74.80076608107143
946
+ - type: dot_f1
947
+ value: 68.97989210397255
948
+ - type: dot_precision
949
+ value: 64.42968392120935
950
+ - type: dot_recall
951
+ value: 74.22163588390501
952
+ - type: euclidean_accuracy
953
+ value: 86.27883411813792
954
+ - type: euclidean_ap
955
+ value: 74.80076820459502
956
+ - type: euclidean_f1
957
+ value: 68.97989210397255
958
+ - type: euclidean_precision
959
+ value: 64.42968392120935
960
+ - type: euclidean_recall
961
+ value: 74.22163588390501
962
+ - type: manhattan_accuracy
963
+ value: 86.23711032961793
964
+ - type: manhattan_ap
965
+ value: 74.73958348950038
966
+ - type: manhattan_f1
967
+ value: 68.76052948255115
968
+ - type: manhattan_precision
969
+ value: 63.207964601769916
970
+ - type: manhattan_recall
971
+ value: 75.3825857519789
972
+ - type: max_accuracy
973
+ value: 86.27883411813792
974
+ - type: max_ap
975
+ value: 74.80076820459502
976
+ - type: max_f1
977
+ value: 68.97989210397255
978
+ - task:
979
+ type: PairClassification
980
+ dataset:
981
+ type: mteb/twitterurlcorpus-pairclassification
982
+ name: MTEB TwitterURLCorpus
983
+ config: default
984
+ split: test
985
+ revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf
986
+ metrics:
987
+ - type: cos_sim_accuracy
988
+ value: 89.09263787014399
989
+ - type: cos_sim_ap
990
+ value: 86.46378381763645
991
+ - type: cos_sim_f1
992
+ value: 78.67838784176413
993
+ - type: cos_sim_precision
994
+ value: 76.20868812238419
995
+ - type: cos_sim_recall
996
+ value: 81.3135201724669
997
+ - type: dot_accuracy
998
+ value: 89.09263787014399
999
+ - type: dot_ap
1000
+ value: 86.46378353247907
1001
+ - type: dot_f1
1002
+ value: 78.67838784176413
1003
+ - type: dot_precision
1004
+ value: 76.20868812238419
1005
+ - type: dot_recall
1006
+ value: 81.3135201724669
1007
+ - type: euclidean_accuracy
1008
+ value: 89.09263787014399
1009
+ - type: euclidean_ap
1010
+ value: 86.46378511891255
1011
+ - type: euclidean_f1
1012
+ value: 78.67838784176413
1013
+ - type: euclidean_precision
1014
+ value: 76.20868812238419
1015
+ - type: euclidean_recall
1016
+ value: 81.3135201724669
1017
+ - type: manhattan_accuracy
1018
+ value: 89.09069740365584
1019
+ - type: manhattan_ap
1020
+ value: 86.44864502475154
1021
+ - type: manhattan_f1
1022
+ value: 78.67372818141132
1023
+ - type: manhattan_precision
1024
+ value: 76.29484953703704
1025
+ - type: manhattan_recall
1026
+ value: 81.20572836464429
1027
+ - type: max_accuracy
1028
+ value: 89.09263787014399
1029
+ - type: max_ap
1030
+ value: 86.46378511891255
1031
+ - type: max_f1
1032
+ value: 78.67838784176413
1033
+ ---
1034
+
1035
+
1036
+ # Cohere embed-english-v3.0
1037
+
1038
+ This repository contains the tokenizer for the Cohere `embed-english-v3.0` model. See our blogpost [Cohere Embed V3](https://txt.cohere.com/introducing-embed-v3/) for more details on this model.
1039
+
1040
+ You can use the embedding model either via the Cohere API, AWS SageMaker or in your private deployments.
1041
+
1042
+ ## Usage Cohere API
1043
+
1044
+ The following code snippet shows the usage of the Cohere API. Install the cohere SDK via:
1045
+ ```
1046
+ pip install -U cohere
1047
+ ```
1048
+
1049
+ Get your free API key on: www.cohere.com
1050
+
1051
+
1052
+ ```python
1053
+ # This snippet shows and example how to use the Cohere Embed V3 models for semantic search.
1054
+ # Make sure to have the Cohere SDK in at least v4.30 install: pip install -U cohere
1055
+ # Get your API key from: www.cohere.com
1056
+ import cohere
1057
+ import numpy as np
1058
+
1059
+ cohere_key = "{YOUR_COHERE_API_KEY}" #Get your API key from www.cohere.com
1060
+ co = cohere.Client(cohere_key)
1061
+
1062
+ docs = ["The capital of France is Paris",
1063
+ "PyTorch is a machine learning framework based on the Torch library.",
1064
+ "The average cat lifespan is between 13-17 years"]
1065
+
1066
+
1067
+ #Encode your documents with input type 'search_document'
1068
+ doc_emb = co.embed(docs, input_type="search_document", model="embed-english-v3.0").embeddings
1069
+ doc_emb = np.asarray(doc_emb)
1070
+
1071
+
1072
+ #Encode your query with input type 'search_query'
1073
+ query = "What is Pytorch"
1074
+ query_emb = co.embed([query], input_type="search_query", model="embed-english-v3.0").embeddings
1075
+ query_emb = np.asarray(query_emb)
1076
+ query_emb.shape
1077
+
1078
+ #Compute the dot product between query embedding and document embedding
1079
+ scores = np.dot(query_emb, doc_emb.T)[0]
1080
+
1081
+ #Find the highest scores
1082
+ max_idx = np.argsort(-scores)
1083
+
1084
+ print(f"Query: {query}")
1085
+ for idx in max_idx:
1086
+ print(f"Score: {scores[idx]:.2f}")
1087
+ print(docs[idx])
1088
+ print("--------")
1089
+ ```
1090
+
1091
+ ## Usage AWS SageMaker
1092
+ The embedding model can be privately deployed in your AWS Cloud using our [AWS SageMaker marketplace offering](https://aws.amazon.com/marketplace/pp/prodview-z6huxszcqc25i). It runs privately in your VPC, with latencies as low as 5ms for query encoding.
1093
+
1094
+ ## Usage AWS Bedrock
1095
+ Soon the model will also be available via AWS Bedrock. Stay tuned
1096
+
1097
+ ## Private Deployment
1098
+ You want to run the model on your own hardware? [Contact Sales](https://cohere.com/contact-sales) to learn more.
1099
+
1100
+ ## Supported Languages
1101
+ This model was trained on nearly 1B English training pairs.
1102
+
1103
+ Evaluation results can be found in the [Embed V3.0 Benchmark Results spreadsheet](https://docs.google.com/spreadsheets/d/1w7gnHWMDBdEUrmHgSfDnGHJgVQE5aOiXCCwO3uNH_mI/edit?usp=sharing).
added_tokens.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "[COHERE_CLUSTERING_ID]": 30524,
3
+ "[COHERE_SEARCH_DOCUMENT_ID]": 30522,
4
+ "[COHERE_SEARCH_QUERY_ID]": 30523
5
+ }
config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "n_positions": 512,
3
+ "hidden_dim": 1024
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "clean_up_tokenization_spaces": true,
3
+ "cls_token": "[CLS]",
4
+ "do_lower_case": true,
5
+ "mask_token": "[MASK]",
6
+ "max_length": 512,
7
+ "model_max_length": 512,
8
+ "pad_to_multiple_of": null,
9
+ "pad_token": "[PAD]",
10
+ "pad_token_type_id": 0,
11
+ "padding_side": "right",
12
+ "sep_token": "[SEP]",
13
+ "stride": 0,
14
+ "strip_accents": null,
15
+ "tokenize_chinese_chars": true,
16
+ "tokenizer_class": "BertTokenizer",
17
+ "truncation_side": "right",
18
+ "truncation_strategy": "longest_first",
19
+ "unk_token": "[UNK]"
20
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff