michaelfeil commited on
Commit
2de9369
1 Parent(s): 96da117

Upload BAAI/bge-base-en-v1.5 ctranslate2 weights

Browse files
README.md ADDED
@@ -0,0 +1,3081 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - ctranslate2
4
+ - int8
5
+ - float16
6
+ - sentence-transformers
7
+ - feature-extraction
8
+ - sentence-similarity
9
+ - transformers
10
+ - mteb
11
+ model-index:
12
+ - name: bge-base-en-v1.5
13
+ results:
14
+ - task:
15
+ type: Classification
16
+ dataset:
17
+ type: mteb/amazon_counterfactual
18
+ name: MTEB AmazonCounterfactualClassification (en)
19
+ config: en
20
+ split: test
21
+ revision: e8379541af4e31359cca9fbcf4b00f2671dba205
22
+ metrics:
23
+ - type: accuracy
24
+ value: 76.14925373134328
25
+ - type: ap
26
+ value: 39.32336517995478
27
+ - type: f1
28
+ value: 70.16902252611425
29
+ - task:
30
+ type: Classification
31
+ dataset:
32
+ type: mteb/amazon_polarity
33
+ name: MTEB AmazonPolarityClassification
34
+ config: default
35
+ split: test
36
+ revision: e2d317d38cd51312af73b3d32a06d1a08b442046
37
+ metrics:
38
+ - type: accuracy
39
+ value: 93.386825
40
+ - type: ap
41
+ value: 90.21276917991995
42
+ - type: f1
43
+ value: 93.37741030006174
44
+ - task:
45
+ type: Classification
46
+ dataset:
47
+ type: mteb/amazon_reviews_multi
48
+ name: MTEB AmazonReviewsClassification (en)
49
+ config: en
50
+ split: test
51
+ revision: 1399c76144fd37290681b995c656ef9b2e06e26d
52
+ metrics:
53
+ - type: accuracy
54
+ value: 48.846000000000004
55
+ - type: f1
56
+ value: 48.14646269778261
57
+ - task:
58
+ type: Retrieval
59
+ dataset:
60
+ type: arguana
61
+ name: MTEB ArguAna
62
+ config: default
63
+ split: test
64
+ revision: None
65
+ metrics:
66
+ - type: map_at_1
67
+ value: 40.754000000000005
68
+ - type: map_at_10
69
+ value: 55.761
70
+ - type: map_at_100
71
+ value: 56.330999999999996
72
+ - type: map_at_1000
73
+ value: 56.333999999999996
74
+ - type: map_at_3
75
+ value: 51.92
76
+ - type: map_at_5
77
+ value: 54.010999999999996
78
+ - type: mrr_at_1
79
+ value: 41.181
80
+ - type: mrr_at_10
81
+ value: 55.967999999999996
82
+ - type: mrr_at_100
83
+ value: 56.538
84
+ - type: mrr_at_1000
85
+ value: 56.542
86
+ - type: mrr_at_3
87
+ value: 51.980000000000004
88
+ - type: mrr_at_5
89
+ value: 54.208999999999996
90
+ - type: ndcg_at_1
91
+ value: 40.754000000000005
92
+ - type: ndcg_at_10
93
+ value: 63.605000000000004
94
+ - type: ndcg_at_100
95
+ value: 66.05199999999999
96
+ - type: ndcg_at_1000
97
+ value: 66.12
98
+ - type: ndcg_at_3
99
+ value: 55.708
100
+ - type: ndcg_at_5
101
+ value: 59.452000000000005
102
+ - type: precision_at_1
103
+ value: 40.754000000000005
104
+ - type: precision_at_10
105
+ value: 8.841000000000001
106
+ - type: precision_at_100
107
+ value: 0.991
108
+ - type: precision_at_1000
109
+ value: 0.1
110
+ - type: precision_at_3
111
+ value: 22.238
112
+ - type: precision_at_5
113
+ value: 15.149000000000001
114
+ - type: recall_at_1
115
+ value: 40.754000000000005
116
+ - type: recall_at_10
117
+ value: 88.407
118
+ - type: recall_at_100
119
+ value: 99.14699999999999
120
+ - type: recall_at_1000
121
+ value: 99.644
122
+ - type: recall_at_3
123
+ value: 66.714
124
+ - type: recall_at_5
125
+ value: 75.747
126
+ - task:
127
+ type: Clustering
128
+ dataset:
129
+ type: mteb/arxiv-clustering-p2p
130
+ name: MTEB ArxivClusteringP2P
131
+ config: default
132
+ split: test
133
+ revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d
134
+ metrics:
135
+ - type: v_measure
136
+ value: 48.74884539679369
137
+ - task:
138
+ type: Clustering
139
+ dataset:
140
+ type: mteb/arxiv-clustering-s2s
141
+ name: MTEB ArxivClusteringS2S
142
+ config: default
143
+ split: test
144
+ revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53
145
+ metrics:
146
+ - type: v_measure
147
+ value: 42.8075893810716
148
+ - task:
149
+ type: Reranking
150
+ dataset:
151
+ type: mteb/askubuntudupquestions-reranking
152
+ name: MTEB AskUbuntuDupQuestions
153
+ config: default
154
+ split: test
155
+ revision: 2000358ca161889fa9c082cb41daa8dcfb161a54
156
+ metrics:
157
+ - type: map
158
+ value: 62.128470519187736
159
+ - type: mrr
160
+ value: 74.28065778481289
161
+ - task:
162
+ type: STS
163
+ dataset:
164
+ type: mteb/biosses-sts
165
+ name: MTEB BIOSSES
166
+ config: default
167
+ split: test
168
+ revision: d3fb88f8f02e40887cd149695127462bbcf29b4a
169
+ metrics:
170
+ - type: cos_sim_pearson
171
+ value: 89.24629081484655
172
+ - type: cos_sim_spearman
173
+ value: 86.93752309911496
174
+ - type: euclidean_pearson
175
+ value: 87.58589628573816
176
+ - type: euclidean_spearman
177
+ value: 88.05622328825284
178
+ - type: manhattan_pearson
179
+ value: 87.5594959805773
180
+ - type: manhattan_spearman
181
+ value: 88.19658793233961
182
+ - task:
183
+ type: Classification
184
+ dataset:
185
+ type: mteb/banking77
186
+ name: MTEB Banking77Classification
187
+ config: default
188
+ split: test
189
+ revision: 0fd18e25b25c072e09e0d92ab615fda904d66300
190
+ metrics:
191
+ - type: accuracy
192
+ value: 86.9512987012987
193
+ - type: f1
194
+ value: 86.92515357973708
195
+ - task:
196
+ type: Clustering
197
+ dataset:
198
+ type: mteb/biorxiv-clustering-p2p
199
+ name: MTEB BiorxivClusteringP2P
200
+ config: default
201
+ split: test
202
+ revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40
203
+ metrics:
204
+ - type: v_measure
205
+ value: 39.10263762928872
206
+ - task:
207
+ type: Clustering
208
+ dataset:
209
+ type: mteb/biorxiv-clustering-s2s
210
+ name: MTEB BiorxivClusteringS2S
211
+ config: default
212
+ split: test
213
+ revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908
214
+ metrics:
215
+ - type: v_measure
216
+ value: 36.69711517426737
217
+ - task:
218
+ type: Retrieval
219
+ dataset:
220
+ type: BeIR/cqadupstack
221
+ name: MTEB CQADupstackAndroidRetrieval
222
+ config: default
223
+ split: test
224
+ revision: None
225
+ metrics:
226
+ - type: map_at_1
227
+ value: 32.327
228
+ - type: map_at_10
229
+ value: 44.099
230
+ - type: map_at_100
231
+ value: 45.525
232
+ - type: map_at_1000
233
+ value: 45.641999999999996
234
+ - type: map_at_3
235
+ value: 40.47
236
+ - type: map_at_5
237
+ value: 42.36
238
+ - type: mrr_at_1
239
+ value: 39.199
240
+ - type: mrr_at_10
241
+ value: 49.651
242
+ - type: mrr_at_100
243
+ value: 50.29
244
+ - type: mrr_at_1000
245
+ value: 50.329
246
+ - type: mrr_at_3
247
+ value: 46.924
248
+ - type: mrr_at_5
249
+ value: 48.548
250
+ - type: ndcg_at_1
251
+ value: 39.199
252
+ - type: ndcg_at_10
253
+ value: 50.773
254
+ - type: ndcg_at_100
255
+ value: 55.67999999999999
256
+ - type: ndcg_at_1000
257
+ value: 57.495
258
+ - type: ndcg_at_3
259
+ value: 45.513999999999996
260
+ - type: ndcg_at_5
261
+ value: 47.703
262
+ - type: precision_at_1
263
+ value: 39.199
264
+ - type: precision_at_10
265
+ value: 9.914000000000001
266
+ - type: precision_at_100
267
+ value: 1.5310000000000001
268
+ - type: precision_at_1000
269
+ value: 0.198
270
+ - type: precision_at_3
271
+ value: 21.984
272
+ - type: precision_at_5
273
+ value: 15.737000000000002
274
+ - type: recall_at_1
275
+ value: 32.327
276
+ - type: recall_at_10
277
+ value: 63.743
278
+ - type: recall_at_100
279
+ value: 84.538
280
+ - type: recall_at_1000
281
+ value: 96.089
282
+ - type: recall_at_3
283
+ value: 48.065000000000005
284
+ - type: recall_at_5
285
+ value: 54.519
286
+ - task:
287
+ type: Retrieval
288
+ dataset:
289
+ type: BeIR/cqadupstack
290
+ name: MTEB CQADupstackEnglishRetrieval
291
+ config: default
292
+ split: test
293
+ revision: None
294
+ metrics:
295
+ - type: map_at_1
296
+ value: 32.671
297
+ - type: map_at_10
298
+ value: 42.954
299
+ - type: map_at_100
300
+ value: 44.151
301
+ - type: map_at_1000
302
+ value: 44.287
303
+ - type: map_at_3
304
+ value: 39.912
305
+ - type: map_at_5
306
+ value: 41.798
307
+ - type: mrr_at_1
308
+ value: 41.465
309
+ - type: mrr_at_10
310
+ value: 49.351
311
+ - type: mrr_at_100
312
+ value: 49.980000000000004
313
+ - type: mrr_at_1000
314
+ value: 50.016000000000005
315
+ - type: mrr_at_3
316
+ value: 47.144000000000005
317
+ - type: mrr_at_5
318
+ value: 48.592999999999996
319
+ - type: ndcg_at_1
320
+ value: 41.465
321
+ - type: ndcg_at_10
322
+ value: 48.565999999999995
323
+ - type: ndcg_at_100
324
+ value: 52.76499999999999
325
+ - type: ndcg_at_1000
326
+ value: 54.749
327
+ - type: ndcg_at_3
328
+ value: 44.57
329
+ - type: ndcg_at_5
330
+ value: 46.759
331
+ - type: precision_at_1
332
+ value: 41.465
333
+ - type: precision_at_10
334
+ value: 9.107999999999999
335
+ - type: precision_at_100
336
+ value: 1.433
337
+ - type: precision_at_1000
338
+ value: 0.191
339
+ - type: precision_at_3
340
+ value: 21.423000000000002
341
+ - type: precision_at_5
342
+ value: 15.414
343
+ - type: recall_at_1
344
+ value: 32.671
345
+ - type: recall_at_10
346
+ value: 57.738
347
+ - type: recall_at_100
348
+ value: 75.86500000000001
349
+ - type: recall_at_1000
350
+ value: 88.36
351
+ - type: recall_at_3
352
+ value: 45.626
353
+ - type: recall_at_5
354
+ value: 51.812000000000005
355
+ - task:
356
+ type: Retrieval
357
+ dataset:
358
+ type: BeIR/cqadupstack
359
+ name: MTEB CQADupstackGamingRetrieval
360
+ config: default
361
+ split: test
362
+ revision: None
363
+ metrics:
364
+ - type: map_at_1
365
+ value: 41.185
366
+ - type: map_at_10
367
+ value: 53.929
368
+ - type: map_at_100
369
+ value: 54.92
370
+ - type: map_at_1000
371
+ value: 54.967999999999996
372
+ - type: map_at_3
373
+ value: 50.70400000000001
374
+ - type: map_at_5
375
+ value: 52.673
376
+ - type: mrr_at_1
377
+ value: 47.398
378
+ - type: mrr_at_10
379
+ value: 57.303000000000004
380
+ - type: mrr_at_100
381
+ value: 57.959
382
+ - type: mrr_at_1000
383
+ value: 57.985
384
+ - type: mrr_at_3
385
+ value: 54.932
386
+ - type: mrr_at_5
387
+ value: 56.464999999999996
388
+ - type: ndcg_at_1
389
+ value: 47.398
390
+ - type: ndcg_at_10
391
+ value: 59.653
392
+ - type: ndcg_at_100
393
+ value: 63.627
394
+ - type: ndcg_at_1000
395
+ value: 64.596
396
+ - type: ndcg_at_3
397
+ value: 54.455
398
+ - type: ndcg_at_5
399
+ value: 57.245000000000005
400
+ - type: precision_at_1
401
+ value: 47.398
402
+ - type: precision_at_10
403
+ value: 9.524000000000001
404
+ - type: precision_at_100
405
+ value: 1.243
406
+ - type: precision_at_1000
407
+ value: 0.13699999999999998
408
+ - type: precision_at_3
409
+ value: 24.389
410
+ - type: precision_at_5
411
+ value: 16.752
412
+ - type: recall_at_1
413
+ value: 41.185
414
+ - type: recall_at_10
415
+ value: 73.193
416
+ - type: recall_at_100
417
+ value: 90.357
418
+ - type: recall_at_1000
419
+ value: 97.253
420
+ - type: recall_at_3
421
+ value: 59.199999999999996
422
+ - type: recall_at_5
423
+ value: 66.118
424
+ - task:
425
+ type: Retrieval
426
+ dataset:
427
+ type: BeIR/cqadupstack
428
+ name: MTEB CQADupstackGisRetrieval
429
+ config: default
430
+ split: test
431
+ revision: None
432
+ metrics:
433
+ - type: map_at_1
434
+ value: 27.27
435
+ - type: map_at_10
436
+ value: 36.223
437
+ - type: map_at_100
438
+ value: 37.218
439
+ - type: map_at_1000
440
+ value: 37.293
441
+ - type: map_at_3
442
+ value: 33.503
443
+ - type: map_at_5
444
+ value: 35.097
445
+ - type: mrr_at_1
446
+ value: 29.492
447
+ - type: mrr_at_10
448
+ value: 38.352000000000004
449
+ - type: mrr_at_100
450
+ value: 39.188
451
+ - type: mrr_at_1000
452
+ value: 39.247
453
+ - type: mrr_at_3
454
+ value: 35.876000000000005
455
+ - type: mrr_at_5
456
+ value: 37.401
457
+ - type: ndcg_at_1
458
+ value: 29.492
459
+ - type: ndcg_at_10
460
+ value: 41.239
461
+ - type: ndcg_at_100
462
+ value: 46.066
463
+ - type: ndcg_at_1000
464
+ value: 47.992000000000004
465
+ - type: ndcg_at_3
466
+ value: 36.11
467
+ - type: ndcg_at_5
468
+ value: 38.772
469
+ - type: precision_at_1
470
+ value: 29.492
471
+ - type: precision_at_10
472
+ value: 6.260000000000001
473
+ - type: precision_at_100
474
+ value: 0.914
475
+ - type: precision_at_1000
476
+ value: 0.11100000000000002
477
+ - type: precision_at_3
478
+ value: 15.104000000000001
479
+ - type: precision_at_5
480
+ value: 10.644
481
+ - type: recall_at_1
482
+ value: 27.27
483
+ - type: recall_at_10
484
+ value: 54.589
485
+ - type: recall_at_100
486
+ value: 76.70700000000001
487
+ - type: recall_at_1000
488
+ value: 91.158
489
+ - type: recall_at_3
490
+ value: 40.974
491
+ - type: recall_at_5
492
+ value: 47.327000000000005
493
+ - task:
494
+ type: Retrieval
495
+ dataset:
496
+ type: BeIR/cqadupstack
497
+ name: MTEB CQADupstackMathematicaRetrieval
498
+ config: default
499
+ split: test
500
+ revision: None
501
+ metrics:
502
+ - type: map_at_1
503
+ value: 17.848
504
+ - type: map_at_10
505
+ value: 26.207
506
+ - type: map_at_100
507
+ value: 27.478
508
+ - type: map_at_1000
509
+ value: 27.602
510
+ - type: map_at_3
511
+ value: 23.405
512
+ - type: map_at_5
513
+ value: 24.98
514
+ - type: mrr_at_1
515
+ value: 21.891
516
+ - type: mrr_at_10
517
+ value: 31.041999999999998
518
+ - type: mrr_at_100
519
+ value: 32.092
520
+ - type: mrr_at_1000
521
+ value: 32.151999999999994
522
+ - type: mrr_at_3
523
+ value: 28.358
524
+ - type: mrr_at_5
525
+ value: 29.969
526
+ - type: ndcg_at_1
527
+ value: 21.891
528
+ - type: ndcg_at_10
529
+ value: 31.585
530
+ - type: ndcg_at_100
531
+ value: 37.531
532
+ - type: ndcg_at_1000
533
+ value: 40.256
534
+ - type: ndcg_at_3
535
+ value: 26.508
536
+ - type: ndcg_at_5
537
+ value: 28.894
538
+ - type: precision_at_1
539
+ value: 21.891
540
+ - type: precision_at_10
541
+ value: 5.795999999999999
542
+ - type: precision_at_100
543
+ value: 0.9990000000000001
544
+ - type: precision_at_1000
545
+ value: 0.13799999999999998
546
+ - type: precision_at_3
547
+ value: 12.769
548
+ - type: precision_at_5
549
+ value: 9.279
550
+ - type: recall_at_1
551
+ value: 17.848
552
+ - type: recall_at_10
553
+ value: 43.452
554
+ - type: recall_at_100
555
+ value: 69.216
556
+ - type: recall_at_1000
557
+ value: 88.102
558
+ - type: recall_at_3
559
+ value: 29.18
560
+ - type: recall_at_5
561
+ value: 35.347
562
+ - task:
563
+ type: Retrieval
564
+ dataset:
565
+ type: BeIR/cqadupstack
566
+ name: MTEB CQADupstackPhysicsRetrieval
567
+ config: default
568
+ split: test
569
+ revision: None
570
+ metrics:
571
+ - type: map_at_1
572
+ value: 30.94
573
+ - type: map_at_10
574
+ value: 41.248000000000005
575
+ - type: map_at_100
576
+ value: 42.495
577
+ - type: map_at_1000
578
+ value: 42.602000000000004
579
+ - type: map_at_3
580
+ value: 37.939
581
+ - type: map_at_5
582
+ value: 39.924
583
+ - type: mrr_at_1
584
+ value: 37.824999999999996
585
+ - type: mrr_at_10
586
+ value: 47.041
587
+ - type: mrr_at_100
588
+ value: 47.83
589
+ - type: mrr_at_1000
590
+ value: 47.878
591
+ - type: mrr_at_3
592
+ value: 44.466
593
+ - type: mrr_at_5
594
+ value: 46.111999999999995
595
+ - type: ndcg_at_1
596
+ value: 37.824999999999996
597
+ - type: ndcg_at_10
598
+ value: 47.223
599
+ - type: ndcg_at_100
600
+ value: 52.394
601
+ - type: ndcg_at_1000
602
+ value: 54.432
603
+ - type: ndcg_at_3
604
+ value: 42.032000000000004
605
+ - type: ndcg_at_5
606
+ value: 44.772
607
+ - type: precision_at_1
608
+ value: 37.824999999999996
609
+ - type: precision_at_10
610
+ value: 8.393
611
+ - type: precision_at_100
612
+ value: 1.2890000000000001
613
+ - type: precision_at_1000
614
+ value: 0.164
615
+ - type: precision_at_3
616
+ value: 19.698
617
+ - type: precision_at_5
618
+ value: 14.013
619
+ - type: recall_at_1
620
+ value: 30.94
621
+ - type: recall_at_10
622
+ value: 59.316
623
+ - type: recall_at_100
624
+ value: 80.783
625
+ - type: recall_at_1000
626
+ value: 94.15400000000001
627
+ - type: recall_at_3
628
+ value: 44.712
629
+ - type: recall_at_5
630
+ value: 51.932
631
+ - task:
632
+ type: Retrieval
633
+ dataset:
634
+ type: BeIR/cqadupstack
635
+ name: MTEB CQADupstackProgrammersRetrieval
636
+ config: default
637
+ split: test
638
+ revision: None
639
+ metrics:
640
+ - type: map_at_1
641
+ value: 27.104
642
+ - type: map_at_10
643
+ value: 36.675999999999995
644
+ - type: map_at_100
645
+ value: 38.076
646
+ - type: map_at_1000
647
+ value: 38.189
648
+ - type: map_at_3
649
+ value: 33.733999999999995
650
+ - type: map_at_5
651
+ value: 35.287
652
+ - type: mrr_at_1
653
+ value: 33.904
654
+ - type: mrr_at_10
655
+ value: 42.55
656
+ - type: mrr_at_100
657
+ value: 43.434
658
+ - type: mrr_at_1000
659
+ value: 43.494
660
+ - type: mrr_at_3
661
+ value: 40.126
662
+ - type: mrr_at_5
663
+ value: 41.473
664
+ - type: ndcg_at_1
665
+ value: 33.904
666
+ - type: ndcg_at_10
667
+ value: 42.414
668
+ - type: ndcg_at_100
669
+ value: 48.203
670
+ - type: ndcg_at_1000
671
+ value: 50.437
672
+ - type: ndcg_at_3
673
+ value: 37.633
674
+ - type: ndcg_at_5
675
+ value: 39.67
676
+ - type: precision_at_1
677
+ value: 33.904
678
+ - type: precision_at_10
679
+ value: 7.82
680
+ - type: precision_at_100
681
+ value: 1.2409999999999999
682
+ - type: precision_at_1000
683
+ value: 0.159
684
+ - type: precision_at_3
685
+ value: 17.884
686
+ - type: precision_at_5
687
+ value: 12.648000000000001
688
+ - type: recall_at_1
689
+ value: 27.104
690
+ - type: recall_at_10
691
+ value: 53.563
692
+ - type: recall_at_100
693
+ value: 78.557
694
+ - type: recall_at_1000
695
+ value: 93.533
696
+ - type: recall_at_3
697
+ value: 39.92
698
+ - type: recall_at_5
699
+ value: 45.457
700
+ - task:
701
+ type: Retrieval
702
+ dataset:
703
+ type: BeIR/cqadupstack
704
+ name: MTEB CQADupstackRetrieval
705
+ config: default
706
+ split: test
707
+ revision: None
708
+ metrics:
709
+ - type: map_at_1
710
+ value: 27.707749999999997
711
+ - type: map_at_10
712
+ value: 36.961
713
+ - type: map_at_100
714
+ value: 38.158833333333334
715
+ - type: map_at_1000
716
+ value: 38.270333333333326
717
+ - type: map_at_3
718
+ value: 34.07183333333334
719
+ - type: map_at_5
720
+ value: 35.69533333333334
721
+ - type: mrr_at_1
722
+ value: 32.81875
723
+ - type: mrr_at_10
724
+ value: 41.293
725
+ - type: mrr_at_100
726
+ value: 42.116499999999995
727
+ - type: mrr_at_1000
728
+ value: 42.170249999999996
729
+ - type: mrr_at_3
730
+ value: 38.83983333333333
731
+ - type: mrr_at_5
732
+ value: 40.29775
733
+ - type: ndcg_at_1
734
+ value: 32.81875
735
+ - type: ndcg_at_10
736
+ value: 42.355
737
+ - type: ndcg_at_100
738
+ value: 47.41374999999999
739
+ - type: ndcg_at_1000
740
+ value: 49.5805
741
+ - type: ndcg_at_3
742
+ value: 37.52825
743
+ - type: ndcg_at_5
744
+ value: 39.83266666666667
745
+ - type: precision_at_1
746
+ value: 32.81875
747
+ - type: precision_at_10
748
+ value: 7.382416666666666
749
+ - type: precision_at_100
750
+ value: 1.1640833333333334
751
+ - type: precision_at_1000
752
+ value: 0.15383333333333335
753
+ - type: precision_at_3
754
+ value: 17.134166666666665
755
+ - type: precision_at_5
756
+ value: 12.174833333333336
757
+ - type: recall_at_1
758
+ value: 27.707749999999997
759
+ - type: recall_at_10
760
+ value: 53.945
761
+ - type: recall_at_100
762
+ value: 76.191
763
+ - type: recall_at_1000
764
+ value: 91.101
765
+ - type: recall_at_3
766
+ value: 40.39083333333334
767
+ - type: recall_at_5
768
+ value: 46.40083333333333
769
+ - task:
770
+ type: Retrieval
771
+ dataset:
772
+ type: BeIR/cqadupstack
773
+ name: MTEB CQADupstackStatsRetrieval
774
+ config: default
775
+ split: test
776
+ revision: None
777
+ metrics:
778
+ - type: map_at_1
779
+ value: 26.482
780
+ - type: map_at_10
781
+ value: 33.201
782
+ - type: map_at_100
783
+ value: 34.107
784
+ - type: map_at_1000
785
+ value: 34.197
786
+ - type: map_at_3
787
+ value: 31.174000000000003
788
+ - type: map_at_5
789
+ value: 32.279
790
+ - type: mrr_at_1
791
+ value: 29.908
792
+ - type: mrr_at_10
793
+ value: 36.235
794
+ - type: mrr_at_100
795
+ value: 37.04
796
+ - type: mrr_at_1000
797
+ value: 37.105
798
+ - type: mrr_at_3
799
+ value: 34.355999999999995
800
+ - type: mrr_at_5
801
+ value: 35.382999999999996
802
+ - type: ndcg_at_1
803
+ value: 29.908
804
+ - type: ndcg_at_10
805
+ value: 37.325
806
+ - type: ndcg_at_100
807
+ value: 41.795
808
+ - type: ndcg_at_1000
809
+ value: 44.105
810
+ - type: ndcg_at_3
811
+ value: 33.555
812
+ - type: ndcg_at_5
813
+ value: 35.266999999999996
814
+ - type: precision_at_1
815
+ value: 29.908
816
+ - type: precision_at_10
817
+ value: 5.721
818
+ - type: precision_at_100
819
+ value: 0.8630000000000001
820
+ - type: precision_at_1000
821
+ value: 0.11299999999999999
822
+ - type: precision_at_3
823
+ value: 14.008000000000001
824
+ - type: precision_at_5
825
+ value: 9.754999999999999
826
+ - type: recall_at_1
827
+ value: 26.482
828
+ - type: recall_at_10
829
+ value: 47.072
830
+ - type: recall_at_100
831
+ value: 67.27
832
+ - type: recall_at_1000
833
+ value: 84.371
834
+ - type: recall_at_3
835
+ value: 36.65
836
+ - type: recall_at_5
837
+ value: 40.774
838
+ - task:
839
+ type: Retrieval
840
+ dataset:
841
+ type: BeIR/cqadupstack
842
+ name: MTEB CQADupstackTexRetrieval
843
+ config: default
844
+ split: test
845
+ revision: None
846
+ metrics:
847
+ - type: map_at_1
848
+ value: 18.815
849
+ - type: map_at_10
850
+ value: 26.369999999999997
851
+ - type: map_at_100
852
+ value: 27.458
853
+ - type: map_at_1000
854
+ value: 27.588
855
+ - type: map_at_3
856
+ value: 23.990000000000002
857
+ - type: map_at_5
858
+ value: 25.345000000000002
859
+ - type: mrr_at_1
860
+ value: 22.953000000000003
861
+ - type: mrr_at_10
862
+ value: 30.342999999999996
863
+ - type: mrr_at_100
864
+ value: 31.241000000000003
865
+ - type: mrr_at_1000
866
+ value: 31.319000000000003
867
+ - type: mrr_at_3
868
+ value: 28.16
869
+ - type: mrr_at_5
870
+ value: 29.406
871
+ - type: ndcg_at_1
872
+ value: 22.953000000000003
873
+ - type: ndcg_at_10
874
+ value: 31.151
875
+ - type: ndcg_at_100
876
+ value: 36.309000000000005
877
+ - type: ndcg_at_1000
878
+ value: 39.227000000000004
879
+ - type: ndcg_at_3
880
+ value: 26.921
881
+ - type: ndcg_at_5
882
+ value: 28.938000000000002
883
+ - type: precision_at_1
884
+ value: 22.953000000000003
885
+ - type: precision_at_10
886
+ value: 5.602
887
+ - type: precision_at_100
888
+ value: 0.9530000000000001
889
+ - type: precision_at_1000
890
+ value: 0.13899999999999998
891
+ - type: precision_at_3
892
+ value: 12.606
893
+ - type: precision_at_5
894
+ value: 9.119
895
+ - type: recall_at_1
896
+ value: 18.815
897
+ - type: recall_at_10
898
+ value: 41.574
899
+ - type: recall_at_100
900
+ value: 64.84400000000001
901
+ - type: recall_at_1000
902
+ value: 85.406
903
+ - type: recall_at_3
904
+ value: 29.694
905
+ - type: recall_at_5
906
+ value: 34.935
907
+ - task:
908
+ type: Retrieval
909
+ dataset:
910
+ type: BeIR/cqadupstack
911
+ name: MTEB CQADupstackUnixRetrieval
912
+ config: default
913
+ split: test
914
+ revision: None
915
+ metrics:
916
+ - type: map_at_1
917
+ value: 27.840999999999998
918
+ - type: map_at_10
919
+ value: 36.797999999999995
920
+ - type: map_at_100
921
+ value: 37.993
922
+ - type: map_at_1000
923
+ value: 38.086999999999996
924
+ - type: map_at_3
925
+ value: 34.050999999999995
926
+ - type: map_at_5
927
+ value: 35.379
928
+ - type: mrr_at_1
929
+ value: 32.649
930
+ - type: mrr_at_10
931
+ value: 41.025
932
+ - type: mrr_at_100
933
+ value: 41.878
934
+ - type: mrr_at_1000
935
+ value: 41.929
936
+ - type: mrr_at_3
937
+ value: 38.573
938
+ - type: mrr_at_5
939
+ value: 39.715
940
+ - type: ndcg_at_1
941
+ value: 32.649
942
+ - type: ndcg_at_10
943
+ value: 42.142
944
+ - type: ndcg_at_100
945
+ value: 47.558
946
+ - type: ndcg_at_1000
947
+ value: 49.643
948
+ - type: ndcg_at_3
949
+ value: 37.12
950
+ - type: ndcg_at_5
951
+ value: 38.983000000000004
952
+ - type: precision_at_1
953
+ value: 32.649
954
+ - type: precision_at_10
955
+ value: 7.08
956
+ - type: precision_at_100
957
+ value: 1.1039999999999999
958
+ - type: precision_at_1000
959
+ value: 0.13899999999999998
960
+ - type: precision_at_3
961
+ value: 16.698
962
+ - type: precision_at_5
963
+ value: 11.511000000000001
964
+ - type: recall_at_1
965
+ value: 27.840999999999998
966
+ - type: recall_at_10
967
+ value: 54.245
968
+ - type: recall_at_100
969
+ value: 77.947
970
+ - type: recall_at_1000
971
+ value: 92.36999999999999
972
+ - type: recall_at_3
973
+ value: 40.146
974
+ - type: recall_at_5
975
+ value: 44.951
976
+ - task:
977
+ type: Retrieval
978
+ dataset:
979
+ type: BeIR/cqadupstack
980
+ name: MTEB CQADupstackWebmastersRetrieval
981
+ config: default
982
+ split: test
983
+ revision: None
984
+ metrics:
985
+ - type: map_at_1
986
+ value: 26.529000000000003
987
+ - type: map_at_10
988
+ value: 35.010000000000005
989
+ - type: map_at_100
990
+ value: 36.647
991
+ - type: map_at_1000
992
+ value: 36.857
993
+ - type: map_at_3
994
+ value: 31.968000000000004
995
+ - type: map_at_5
996
+ value: 33.554
997
+ - type: mrr_at_1
998
+ value: 31.818
999
+ - type: mrr_at_10
1000
+ value: 39.550999999999995
1001
+ - type: mrr_at_100
1002
+ value: 40.54
1003
+ - type: mrr_at_1000
1004
+ value: 40.596
1005
+ - type: mrr_at_3
1006
+ value: 36.726
1007
+ - type: mrr_at_5
1008
+ value: 38.416
1009
+ - type: ndcg_at_1
1010
+ value: 31.818
1011
+ - type: ndcg_at_10
1012
+ value: 40.675
1013
+ - type: ndcg_at_100
1014
+ value: 46.548
1015
+ - type: ndcg_at_1000
1016
+ value: 49.126
1017
+ - type: ndcg_at_3
1018
+ value: 35.829
1019
+ - type: ndcg_at_5
1020
+ value: 38.0
1021
+ - type: precision_at_1
1022
+ value: 31.818
1023
+ - type: precision_at_10
1024
+ value: 7.826
1025
+ - type: precision_at_100
1026
+ value: 1.538
1027
+ - type: precision_at_1000
1028
+ value: 0.24
1029
+ - type: precision_at_3
1030
+ value: 16.601
1031
+ - type: precision_at_5
1032
+ value: 12.095
1033
+ - type: recall_at_1
1034
+ value: 26.529000000000003
1035
+ - type: recall_at_10
1036
+ value: 51.03
1037
+ - type: recall_at_100
1038
+ value: 77.556
1039
+ - type: recall_at_1000
1040
+ value: 93.804
1041
+ - type: recall_at_3
1042
+ value: 36.986000000000004
1043
+ - type: recall_at_5
1044
+ value: 43.096000000000004
1045
+ - task:
1046
+ type: Retrieval
1047
+ dataset:
1048
+ type: BeIR/cqadupstack
1049
+ name: MTEB CQADupstackWordpressRetrieval
1050
+ config: default
1051
+ split: test
1052
+ revision: None
1053
+ metrics:
1054
+ - type: map_at_1
1055
+ value: 23.480999999999998
1056
+ - type: map_at_10
1057
+ value: 30.817
1058
+ - type: map_at_100
1059
+ value: 31.838
1060
+ - type: map_at_1000
1061
+ value: 31.932
1062
+ - type: map_at_3
1063
+ value: 28.011999999999997
1064
+ - type: map_at_5
1065
+ value: 29.668
1066
+ - type: mrr_at_1
1067
+ value: 25.323
1068
+ - type: mrr_at_10
1069
+ value: 33.072
1070
+ - type: mrr_at_100
1071
+ value: 33.926
1072
+ - type: mrr_at_1000
1073
+ value: 33.993
1074
+ - type: mrr_at_3
1075
+ value: 30.436999999999998
1076
+ - type: mrr_at_5
1077
+ value: 32.092
1078
+ - type: ndcg_at_1
1079
+ value: 25.323
1080
+ - type: ndcg_at_10
1081
+ value: 35.514
1082
+ - type: ndcg_at_100
1083
+ value: 40.489000000000004
1084
+ - type: ndcg_at_1000
1085
+ value: 42.908
1086
+ - type: ndcg_at_3
1087
+ value: 30.092000000000002
1088
+ - type: ndcg_at_5
1089
+ value: 32.989000000000004
1090
+ - type: precision_at_1
1091
+ value: 25.323
1092
+ - type: precision_at_10
1093
+ value: 5.545
1094
+ - type: precision_at_100
1095
+ value: 0.861
1096
+ - type: precision_at_1000
1097
+ value: 0.117
1098
+ - type: precision_at_3
1099
+ value: 12.446
1100
+ - type: precision_at_5
1101
+ value: 9.131
1102
+ - type: recall_at_1
1103
+ value: 23.480999999999998
1104
+ - type: recall_at_10
1105
+ value: 47.825
1106
+ - type: recall_at_100
1107
+ value: 70.652
1108
+ - type: recall_at_1000
1109
+ value: 88.612
1110
+ - type: recall_at_3
1111
+ value: 33.537
1112
+ - type: recall_at_5
1113
+ value: 40.542
1114
+ - task:
1115
+ type: Retrieval
1116
+ dataset:
1117
+ type: climate-fever
1118
+ name: MTEB ClimateFEVER
1119
+ config: default
1120
+ split: test
1121
+ revision: None
1122
+ metrics:
1123
+ - type: map_at_1
1124
+ value: 13.333999999999998
1125
+ - type: map_at_10
1126
+ value: 22.524
1127
+ - type: map_at_100
1128
+ value: 24.506
1129
+ - type: map_at_1000
1130
+ value: 24.715
1131
+ - type: map_at_3
1132
+ value: 19.022
1133
+ - type: map_at_5
1134
+ value: 20.693
1135
+ - type: mrr_at_1
1136
+ value: 29.186
1137
+ - type: mrr_at_10
1138
+ value: 41.22
1139
+ - type: mrr_at_100
1140
+ value: 42.16
1141
+ - type: mrr_at_1000
1142
+ value: 42.192
1143
+ - type: mrr_at_3
1144
+ value: 38.013000000000005
1145
+ - type: mrr_at_5
1146
+ value: 39.704
1147
+ - type: ndcg_at_1
1148
+ value: 29.186
1149
+ - type: ndcg_at_10
1150
+ value: 31.167
1151
+ - type: ndcg_at_100
1152
+ value: 38.879000000000005
1153
+ - type: ndcg_at_1000
1154
+ value: 42.376000000000005
1155
+ - type: ndcg_at_3
1156
+ value: 25.817
1157
+ - type: ndcg_at_5
1158
+ value: 27.377000000000002
1159
+ - type: precision_at_1
1160
+ value: 29.186
1161
+ - type: precision_at_10
1162
+ value: 9.693999999999999
1163
+ - type: precision_at_100
1164
+ value: 1.8030000000000002
1165
+ - type: precision_at_1000
1166
+ value: 0.246
1167
+ - type: precision_at_3
1168
+ value: 19.11
1169
+ - type: precision_at_5
1170
+ value: 14.344999999999999
1171
+ - type: recall_at_1
1172
+ value: 13.333999999999998
1173
+ - type: recall_at_10
1174
+ value: 37.092000000000006
1175
+ - type: recall_at_100
1176
+ value: 63.651
1177
+ - type: recall_at_1000
1178
+ value: 83.05
1179
+ - type: recall_at_3
1180
+ value: 23.74
1181
+ - type: recall_at_5
1182
+ value: 28.655
1183
+ - task:
1184
+ type: Retrieval
1185
+ dataset:
1186
+ type: dbpedia-entity
1187
+ name: MTEB DBPedia
1188
+ config: default
1189
+ split: test
1190
+ revision: None
1191
+ metrics:
1192
+ - type: map_at_1
1193
+ value: 9.151
1194
+ - type: map_at_10
1195
+ value: 19.653000000000002
1196
+ - type: map_at_100
1197
+ value: 28.053
1198
+ - type: map_at_1000
1199
+ value: 29.709000000000003
1200
+ - type: map_at_3
1201
+ value: 14.191
1202
+ - type: map_at_5
1203
+ value: 16.456
1204
+ - type: mrr_at_1
1205
+ value: 66.25
1206
+ - type: mrr_at_10
1207
+ value: 74.4
1208
+ - type: mrr_at_100
1209
+ value: 74.715
1210
+ - type: mrr_at_1000
1211
+ value: 74.726
1212
+ - type: mrr_at_3
1213
+ value: 72.417
1214
+ - type: mrr_at_5
1215
+ value: 73.667
1216
+ - type: ndcg_at_1
1217
+ value: 54.25
1218
+ - type: ndcg_at_10
1219
+ value: 40.77
1220
+ - type: ndcg_at_100
1221
+ value: 46.359
1222
+ - type: ndcg_at_1000
1223
+ value: 54.193000000000005
1224
+ - type: ndcg_at_3
1225
+ value: 44.832
1226
+ - type: ndcg_at_5
1227
+ value: 42.63
1228
+ - type: precision_at_1
1229
+ value: 66.25
1230
+ - type: precision_at_10
1231
+ value: 32.175
1232
+ - type: precision_at_100
1233
+ value: 10.668
1234
+ - type: precision_at_1000
1235
+ value: 2.067
1236
+ - type: precision_at_3
1237
+ value: 47.667
1238
+ - type: precision_at_5
1239
+ value: 41.3
1240
+ - type: recall_at_1
1241
+ value: 9.151
1242
+ - type: recall_at_10
1243
+ value: 25.003999999999998
1244
+ - type: recall_at_100
1245
+ value: 52.976
1246
+ - type: recall_at_1000
1247
+ value: 78.315
1248
+ - type: recall_at_3
1249
+ value: 15.487
1250
+ - type: recall_at_5
1251
+ value: 18.999
1252
+ - task:
1253
+ type: Classification
1254
+ dataset:
1255
+ type: mteb/emotion
1256
+ name: MTEB EmotionClassification
1257
+ config: default
1258
+ split: test
1259
+ revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37
1260
+ metrics:
1261
+ - type: accuracy
1262
+ value: 51.89999999999999
1263
+ - type: f1
1264
+ value: 46.47777925067403
1265
+ - task:
1266
+ type: Retrieval
1267
+ dataset:
1268
+ type: fever
1269
+ name: MTEB FEVER
1270
+ config: default
1271
+ split: test
1272
+ revision: None
1273
+ metrics:
1274
+ - type: map_at_1
1275
+ value: 73.706
1276
+ - type: map_at_10
1277
+ value: 82.423
1278
+ - type: map_at_100
1279
+ value: 82.67999999999999
1280
+ - type: map_at_1000
1281
+ value: 82.694
1282
+ - type: map_at_3
1283
+ value: 81.328
1284
+ - type: map_at_5
1285
+ value: 82.001
1286
+ - type: mrr_at_1
1287
+ value: 79.613
1288
+ - type: mrr_at_10
1289
+ value: 87.07000000000001
1290
+ - type: mrr_at_100
1291
+ value: 87.169
1292
+ - type: mrr_at_1000
1293
+ value: 87.17
1294
+ - type: mrr_at_3
1295
+ value: 86.404
1296
+ - type: mrr_at_5
1297
+ value: 86.856
1298
+ - type: ndcg_at_1
1299
+ value: 79.613
1300
+ - type: ndcg_at_10
1301
+ value: 86.289
1302
+ - type: ndcg_at_100
1303
+ value: 87.201
1304
+ - type: ndcg_at_1000
1305
+ value: 87.428
1306
+ - type: ndcg_at_3
1307
+ value: 84.625
1308
+ - type: ndcg_at_5
1309
+ value: 85.53699999999999
1310
+ - type: precision_at_1
1311
+ value: 79.613
1312
+ - type: precision_at_10
1313
+ value: 10.399
1314
+ - type: precision_at_100
1315
+ value: 1.1079999999999999
1316
+ - type: precision_at_1000
1317
+ value: 0.11499999999999999
1318
+ - type: precision_at_3
1319
+ value: 32.473
1320
+ - type: precision_at_5
1321
+ value: 20.132
1322
+ - type: recall_at_1
1323
+ value: 73.706
1324
+ - type: recall_at_10
1325
+ value: 93.559
1326
+ - type: recall_at_100
1327
+ value: 97.188
1328
+ - type: recall_at_1000
1329
+ value: 98.555
1330
+ - type: recall_at_3
1331
+ value: 88.98700000000001
1332
+ - type: recall_at_5
1333
+ value: 91.373
1334
+ - task:
1335
+ type: Retrieval
1336
+ dataset:
1337
+ type: fiqa
1338
+ name: MTEB FiQA2018
1339
+ config: default
1340
+ split: test
1341
+ revision: None
1342
+ metrics:
1343
+ - type: map_at_1
1344
+ value: 19.841
1345
+ - type: map_at_10
1346
+ value: 32.643
1347
+ - type: map_at_100
1348
+ value: 34.575
1349
+ - type: map_at_1000
1350
+ value: 34.736
1351
+ - type: map_at_3
1352
+ value: 28.317999999999998
1353
+ - type: map_at_5
1354
+ value: 30.964000000000002
1355
+ - type: mrr_at_1
1356
+ value: 39.660000000000004
1357
+ - type: mrr_at_10
1358
+ value: 48.620000000000005
1359
+ - type: mrr_at_100
1360
+ value: 49.384
1361
+ - type: mrr_at_1000
1362
+ value: 49.415
1363
+ - type: mrr_at_3
1364
+ value: 45.988
1365
+ - type: mrr_at_5
1366
+ value: 47.361
1367
+ - type: ndcg_at_1
1368
+ value: 39.660000000000004
1369
+ - type: ndcg_at_10
1370
+ value: 40.646
1371
+ - type: ndcg_at_100
1372
+ value: 47.657
1373
+ - type: ndcg_at_1000
1374
+ value: 50.428
1375
+ - type: ndcg_at_3
1376
+ value: 36.689
1377
+ - type: ndcg_at_5
1378
+ value: 38.211
1379
+ - type: precision_at_1
1380
+ value: 39.660000000000004
1381
+ - type: precision_at_10
1382
+ value: 11.235000000000001
1383
+ - type: precision_at_100
1384
+ value: 1.8530000000000002
1385
+ - type: precision_at_1000
1386
+ value: 0.23600000000000002
1387
+ - type: precision_at_3
1388
+ value: 24.587999999999997
1389
+ - type: precision_at_5
1390
+ value: 18.395
1391
+ - type: recall_at_1
1392
+ value: 19.841
1393
+ - type: recall_at_10
1394
+ value: 48.135
1395
+ - type: recall_at_100
1396
+ value: 74.224
1397
+ - type: recall_at_1000
1398
+ value: 90.826
1399
+ - type: recall_at_3
1400
+ value: 33.536
1401
+ - type: recall_at_5
1402
+ value: 40.311
1403
+ - task:
1404
+ type: Retrieval
1405
+ dataset:
1406
+ type: hotpotqa
1407
+ name: MTEB HotpotQA
1408
+ config: default
1409
+ split: test
1410
+ revision: None
1411
+ metrics:
1412
+ - type: map_at_1
1413
+ value: 40.358
1414
+ - type: map_at_10
1415
+ value: 64.497
1416
+ - type: map_at_100
1417
+ value: 65.362
1418
+ - type: map_at_1000
1419
+ value: 65.41900000000001
1420
+ - type: map_at_3
1421
+ value: 61.06700000000001
1422
+ - type: map_at_5
1423
+ value: 63.317
1424
+ - type: mrr_at_1
1425
+ value: 80.716
1426
+ - type: mrr_at_10
1427
+ value: 86.10799999999999
1428
+ - type: mrr_at_100
1429
+ value: 86.265
1430
+ - type: mrr_at_1000
1431
+ value: 86.27
1432
+ - type: mrr_at_3
1433
+ value: 85.271
1434
+ - type: mrr_at_5
1435
+ value: 85.82499999999999
1436
+ - type: ndcg_at_1
1437
+ value: 80.716
1438
+ - type: ndcg_at_10
1439
+ value: 72.597
1440
+ - type: ndcg_at_100
1441
+ value: 75.549
1442
+ - type: ndcg_at_1000
1443
+ value: 76.61
1444
+ - type: ndcg_at_3
1445
+ value: 67.874
1446
+ - type: ndcg_at_5
1447
+ value: 70.655
1448
+ - type: precision_at_1
1449
+ value: 80.716
1450
+ - type: precision_at_10
1451
+ value: 15.148
1452
+ - type: precision_at_100
1453
+ value: 1.745
1454
+ - type: precision_at_1000
1455
+ value: 0.188
1456
+ - type: precision_at_3
1457
+ value: 43.597
1458
+ - type: precision_at_5
1459
+ value: 28.351
1460
+ - type: recall_at_1
1461
+ value: 40.358
1462
+ - type: recall_at_10
1463
+ value: 75.739
1464
+ - type: recall_at_100
1465
+ value: 87.259
1466
+ - type: recall_at_1000
1467
+ value: 94.234
1468
+ - type: recall_at_3
1469
+ value: 65.39500000000001
1470
+ - type: recall_at_5
1471
+ value: 70.878
1472
+ - task:
1473
+ type: Classification
1474
+ dataset:
1475
+ type: mteb/imdb
1476
+ name: MTEB ImdbClassification
1477
+ config: default
1478
+ split: test
1479
+ revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7
1480
+ metrics:
1481
+ - type: accuracy
1482
+ value: 90.80799999999998
1483
+ - type: ap
1484
+ value: 86.81350378180757
1485
+ - type: f1
1486
+ value: 90.79901248314215
1487
+ - task:
1488
+ type: Retrieval
1489
+ dataset:
1490
+ type: msmarco
1491
+ name: MTEB MSMARCO
1492
+ config: default
1493
+ split: dev
1494
+ revision: None
1495
+ metrics:
1496
+ - type: map_at_1
1497
+ value: 22.096
1498
+ - type: map_at_10
1499
+ value: 34.384
1500
+ - type: map_at_100
1501
+ value: 35.541
1502
+ - type: map_at_1000
1503
+ value: 35.589999999999996
1504
+ - type: map_at_3
1505
+ value: 30.496000000000002
1506
+ - type: map_at_5
1507
+ value: 32.718
1508
+ - type: mrr_at_1
1509
+ value: 22.750999999999998
1510
+ - type: mrr_at_10
1511
+ value: 35.024
1512
+ - type: mrr_at_100
1513
+ value: 36.125
1514
+ - type: mrr_at_1000
1515
+ value: 36.168
1516
+ - type: mrr_at_3
1517
+ value: 31.225
1518
+ - type: mrr_at_5
1519
+ value: 33.416000000000004
1520
+ - type: ndcg_at_1
1521
+ value: 22.750999999999998
1522
+ - type: ndcg_at_10
1523
+ value: 41.351
1524
+ - type: ndcg_at_100
1525
+ value: 46.92
1526
+ - type: ndcg_at_1000
1527
+ value: 48.111
1528
+ - type: ndcg_at_3
1529
+ value: 33.439
1530
+ - type: ndcg_at_5
1531
+ value: 37.407000000000004
1532
+ - type: precision_at_1
1533
+ value: 22.750999999999998
1534
+ - type: precision_at_10
1535
+ value: 6.564
1536
+ - type: precision_at_100
1537
+ value: 0.935
1538
+ - type: precision_at_1000
1539
+ value: 0.104
1540
+ - type: precision_at_3
1541
+ value: 14.288
1542
+ - type: precision_at_5
1543
+ value: 10.581999999999999
1544
+ - type: recall_at_1
1545
+ value: 22.096
1546
+ - type: recall_at_10
1547
+ value: 62.771
1548
+ - type: recall_at_100
1549
+ value: 88.529
1550
+ - type: recall_at_1000
1551
+ value: 97.55
1552
+ - type: recall_at_3
1553
+ value: 41.245
1554
+ - type: recall_at_5
1555
+ value: 50.788
1556
+ - task:
1557
+ type: Classification
1558
+ dataset:
1559
+ type: mteb/mtop_domain
1560
+ name: MTEB MTOPDomainClassification (en)
1561
+ config: en
1562
+ split: test
1563
+ revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
1564
+ metrics:
1565
+ - type: accuracy
1566
+ value: 94.16780665754673
1567
+ - type: f1
1568
+ value: 93.96331194859894
1569
+ - task:
1570
+ type: Classification
1571
+ dataset:
1572
+ type: mteb/mtop_intent
1573
+ name: MTEB MTOPIntentClassification (en)
1574
+ config: en
1575
+ split: test
1576
+ revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
1577
+ metrics:
1578
+ - type: accuracy
1579
+ value: 76.90606475148198
1580
+ - type: f1
1581
+ value: 58.58344986604187
1582
+ - task:
1583
+ type: Classification
1584
+ dataset:
1585
+ type: mteb/amazon_massive_intent
1586
+ name: MTEB MassiveIntentClassification (en)
1587
+ config: en
1588
+ split: test
1589
+ revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1590
+ metrics:
1591
+ - type: accuracy
1592
+ value: 76.14660390047075
1593
+ - type: f1
1594
+ value: 74.31533923533614
1595
+ - task:
1596
+ type: Classification
1597
+ dataset:
1598
+ type: mteb/amazon_massive_scenario
1599
+ name: MTEB MassiveScenarioClassification (en)
1600
+ config: en
1601
+ split: test
1602
+ revision: 7d571f92784cd94a019292a1f45445077d0ef634
1603
+ metrics:
1604
+ - type: accuracy
1605
+ value: 80.16139878950908
1606
+ - type: f1
1607
+ value: 80.18532656824924
1608
+ - task:
1609
+ type: Clustering
1610
+ dataset:
1611
+ type: mteb/medrxiv-clustering-p2p
1612
+ name: MTEB MedrxivClusteringP2P
1613
+ config: default
1614
+ split: test
1615
+ revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73
1616
+ metrics:
1617
+ - type: v_measure
1618
+ value: 32.949880906135085
1619
+ - task:
1620
+ type: Clustering
1621
+ dataset:
1622
+ type: mteb/medrxiv-clustering-s2s
1623
+ name: MTEB MedrxivClusteringS2S
1624
+ config: default
1625
+ split: test
1626
+ revision: 35191c8c0dca72d8ff3efcd72aa802307d469663
1627
+ metrics:
1628
+ - type: v_measure
1629
+ value: 31.56300351524862
1630
+ - task:
1631
+ type: Reranking
1632
+ dataset:
1633
+ type: mteb/mind_small
1634
+ name: MTEB MindSmallReranking
1635
+ config: default
1636
+ split: test
1637
+ revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69
1638
+ metrics:
1639
+ - type: map
1640
+ value: 31.196521894371315
1641
+ - type: mrr
1642
+ value: 32.22644231694389
1643
+ - task:
1644
+ type: Retrieval
1645
+ dataset:
1646
+ type: nfcorpus
1647
+ name: MTEB NFCorpus
1648
+ config: default
1649
+ split: test
1650
+ revision: None
1651
+ metrics:
1652
+ - type: map_at_1
1653
+ value: 6.783
1654
+ - type: map_at_10
1655
+ value: 14.549000000000001
1656
+ - type: map_at_100
1657
+ value: 18.433
1658
+ - type: map_at_1000
1659
+ value: 19.949
1660
+ - type: map_at_3
1661
+ value: 10.936
1662
+ - type: map_at_5
1663
+ value: 12.514
1664
+ - type: mrr_at_1
1665
+ value: 47.368
1666
+ - type: mrr_at_10
1667
+ value: 56.42
1668
+ - type: mrr_at_100
1669
+ value: 56.908
1670
+ - type: mrr_at_1000
1671
+ value: 56.95
1672
+ - type: mrr_at_3
1673
+ value: 54.283
1674
+ - type: mrr_at_5
1675
+ value: 55.568
1676
+ - type: ndcg_at_1
1677
+ value: 45.666000000000004
1678
+ - type: ndcg_at_10
1679
+ value: 37.389
1680
+ - type: ndcg_at_100
1681
+ value: 34.253
1682
+ - type: ndcg_at_1000
1683
+ value: 43.059999999999995
1684
+ - type: ndcg_at_3
1685
+ value: 42.725
1686
+ - type: ndcg_at_5
1687
+ value: 40.193
1688
+ - type: precision_at_1
1689
+ value: 47.368
1690
+ - type: precision_at_10
1691
+ value: 27.988000000000003
1692
+ - type: precision_at_100
1693
+ value: 8.672
1694
+ - type: precision_at_1000
1695
+ value: 2.164
1696
+ - type: precision_at_3
1697
+ value: 40.248
1698
+ - type: precision_at_5
1699
+ value: 34.737
1700
+ - type: recall_at_1
1701
+ value: 6.783
1702
+ - type: recall_at_10
1703
+ value: 17.838
1704
+ - type: recall_at_100
1705
+ value: 33.672000000000004
1706
+ - type: recall_at_1000
1707
+ value: 66.166
1708
+ - type: recall_at_3
1709
+ value: 11.849
1710
+ - type: recall_at_5
1711
+ value: 14.205000000000002
1712
+ - task:
1713
+ type: Retrieval
1714
+ dataset:
1715
+ type: nq
1716
+ name: MTEB NQ
1717
+ config: default
1718
+ split: test
1719
+ revision: None
1720
+ metrics:
1721
+ - type: map_at_1
1722
+ value: 31.698999999999998
1723
+ - type: map_at_10
1724
+ value: 46.556
1725
+ - type: map_at_100
1726
+ value: 47.652
1727
+ - type: map_at_1000
1728
+ value: 47.68
1729
+ - type: map_at_3
1730
+ value: 42.492000000000004
1731
+ - type: map_at_5
1732
+ value: 44.763999999999996
1733
+ - type: mrr_at_1
1734
+ value: 35.747
1735
+ - type: mrr_at_10
1736
+ value: 49.242999999999995
1737
+ - type: mrr_at_100
1738
+ value: 50.052
1739
+ - type: mrr_at_1000
1740
+ value: 50.068
1741
+ - type: mrr_at_3
1742
+ value: 45.867000000000004
1743
+ - type: mrr_at_5
1744
+ value: 47.778999999999996
1745
+ - type: ndcg_at_1
1746
+ value: 35.717999999999996
1747
+ - type: ndcg_at_10
1748
+ value: 54.14600000000001
1749
+ - type: ndcg_at_100
1750
+ value: 58.672999999999995
1751
+ - type: ndcg_at_1000
1752
+ value: 59.279
1753
+ - type: ndcg_at_3
1754
+ value: 46.407
1755
+ - type: ndcg_at_5
1756
+ value: 50.181
1757
+ - type: precision_at_1
1758
+ value: 35.717999999999996
1759
+ - type: precision_at_10
1760
+ value: 8.844000000000001
1761
+ - type: precision_at_100
1762
+ value: 1.139
1763
+ - type: precision_at_1000
1764
+ value: 0.12
1765
+ - type: precision_at_3
1766
+ value: 20.993000000000002
1767
+ - type: precision_at_5
1768
+ value: 14.791000000000002
1769
+ - type: recall_at_1
1770
+ value: 31.698999999999998
1771
+ - type: recall_at_10
1772
+ value: 74.693
1773
+ - type: recall_at_100
1774
+ value: 94.15299999999999
1775
+ - type: recall_at_1000
1776
+ value: 98.585
1777
+ - type: recall_at_3
1778
+ value: 54.388999999999996
1779
+ - type: recall_at_5
1780
+ value: 63.08200000000001
1781
+ - task:
1782
+ type: Retrieval
1783
+ dataset:
1784
+ type: quora
1785
+ name: MTEB QuoraRetrieval
1786
+ config: default
1787
+ split: test
1788
+ revision: None
1789
+ metrics:
1790
+ - type: map_at_1
1791
+ value: 71.283
1792
+ - type: map_at_10
1793
+ value: 85.24000000000001
1794
+ - type: map_at_100
1795
+ value: 85.882
1796
+ - type: map_at_1000
1797
+ value: 85.897
1798
+ - type: map_at_3
1799
+ value: 82.326
1800
+ - type: map_at_5
1801
+ value: 84.177
1802
+ - type: mrr_at_1
1803
+ value: 82.21000000000001
1804
+ - type: mrr_at_10
1805
+ value: 88.228
1806
+ - type: mrr_at_100
1807
+ value: 88.32
1808
+ - type: mrr_at_1000
1809
+ value: 88.32
1810
+ - type: mrr_at_3
1811
+ value: 87.323
1812
+ - type: mrr_at_5
1813
+ value: 87.94800000000001
1814
+ - type: ndcg_at_1
1815
+ value: 82.17999999999999
1816
+ - type: ndcg_at_10
1817
+ value: 88.9
1818
+ - type: ndcg_at_100
1819
+ value: 90.079
1820
+ - type: ndcg_at_1000
1821
+ value: 90.158
1822
+ - type: ndcg_at_3
1823
+ value: 86.18299999999999
1824
+ - type: ndcg_at_5
1825
+ value: 87.71799999999999
1826
+ - type: precision_at_1
1827
+ value: 82.17999999999999
1828
+ - type: precision_at_10
1829
+ value: 13.464
1830
+ - type: precision_at_100
1831
+ value: 1.533
1832
+ - type: precision_at_1000
1833
+ value: 0.157
1834
+ - type: precision_at_3
1835
+ value: 37.693
1836
+ - type: precision_at_5
1837
+ value: 24.792
1838
+ - type: recall_at_1
1839
+ value: 71.283
1840
+ - type: recall_at_10
1841
+ value: 95.742
1842
+ - type: recall_at_100
1843
+ value: 99.67200000000001
1844
+ - type: recall_at_1000
1845
+ value: 99.981
1846
+ - type: recall_at_3
1847
+ value: 87.888
1848
+ - type: recall_at_5
1849
+ value: 92.24
1850
+ - task:
1851
+ type: Clustering
1852
+ dataset:
1853
+ type: mteb/reddit-clustering
1854
+ name: MTEB RedditClustering
1855
+ config: default
1856
+ split: test
1857
+ revision: 24640382cdbf8abc73003fb0fa6d111a705499eb
1858
+ metrics:
1859
+ - type: v_measure
1860
+ value: 56.24267063669042
1861
+ - task:
1862
+ type: Clustering
1863
+ dataset:
1864
+ type: mteb/reddit-clustering-p2p
1865
+ name: MTEB RedditClusteringP2P
1866
+ config: default
1867
+ split: test
1868
+ revision: 282350215ef01743dc01b456c7f5241fa8937f16
1869
+ metrics:
1870
+ - type: v_measure
1871
+ value: 62.88056988932578
1872
+ - task:
1873
+ type: Retrieval
1874
+ dataset:
1875
+ type: scidocs
1876
+ name: MTEB SCIDOCS
1877
+ config: default
1878
+ split: test
1879
+ revision: None
1880
+ metrics:
1881
+ - type: map_at_1
1882
+ value: 4.903
1883
+ - type: map_at_10
1884
+ value: 13.202
1885
+ - type: map_at_100
1886
+ value: 15.5
1887
+ - type: map_at_1000
1888
+ value: 15.870999999999999
1889
+ - type: map_at_3
1890
+ value: 9.407
1891
+ - type: map_at_5
1892
+ value: 11.238
1893
+ - type: mrr_at_1
1894
+ value: 24.2
1895
+ - type: mrr_at_10
1896
+ value: 35.867
1897
+ - type: mrr_at_100
1898
+ value: 37.001
1899
+ - type: mrr_at_1000
1900
+ value: 37.043
1901
+ - type: mrr_at_3
1902
+ value: 32.5
1903
+ - type: mrr_at_5
1904
+ value: 34.35
1905
+ - type: ndcg_at_1
1906
+ value: 24.2
1907
+ - type: ndcg_at_10
1908
+ value: 21.731
1909
+ - type: ndcg_at_100
1910
+ value: 30.7
1911
+ - type: ndcg_at_1000
1912
+ value: 36.618
1913
+ - type: ndcg_at_3
1914
+ value: 20.72
1915
+ - type: ndcg_at_5
1916
+ value: 17.954
1917
+ - type: precision_at_1
1918
+ value: 24.2
1919
+ - type: precision_at_10
1920
+ value: 11.33
1921
+ - type: precision_at_100
1922
+ value: 2.4410000000000003
1923
+ - type: precision_at_1000
1924
+ value: 0.386
1925
+ - type: precision_at_3
1926
+ value: 19.667
1927
+ - type: precision_at_5
1928
+ value: 15.86
1929
+ - type: recall_at_1
1930
+ value: 4.903
1931
+ - type: recall_at_10
1932
+ value: 22.962
1933
+ - type: recall_at_100
1934
+ value: 49.563
1935
+ - type: recall_at_1000
1936
+ value: 78.238
1937
+ - type: recall_at_3
1938
+ value: 11.953
1939
+ - type: recall_at_5
1940
+ value: 16.067999999999998
1941
+ - task:
1942
+ type: STS
1943
+ dataset:
1944
+ type: mteb/sickr-sts
1945
+ name: MTEB SICK-R
1946
+ config: default
1947
+ split: test
1948
+ revision: a6ea5a8cab320b040a23452cc28066d9beae2cee
1949
+ metrics:
1950
+ - type: cos_sim_pearson
1951
+ value: 84.12694254604078
1952
+ - type: cos_sim_spearman
1953
+ value: 80.30141815181918
1954
+ - type: euclidean_pearson
1955
+ value: 81.34015449877128
1956
+ - type: euclidean_spearman
1957
+ value: 80.13984197010849
1958
+ - type: manhattan_pearson
1959
+ value: 81.31767068124086
1960
+ - type: manhattan_spearman
1961
+ value: 80.11720513114103
1962
+ - task:
1963
+ type: STS
1964
+ dataset:
1965
+ type: mteb/sts12-sts
1966
+ name: MTEB STS12
1967
+ config: default
1968
+ split: test
1969
+ revision: a0d554a64d88156834ff5ae9920b964011b16384
1970
+ metrics:
1971
+ - type: cos_sim_pearson
1972
+ value: 86.13112984010417
1973
+ - type: cos_sim_spearman
1974
+ value: 78.03063573402875
1975
+ - type: euclidean_pearson
1976
+ value: 83.51928418844804
1977
+ - type: euclidean_spearman
1978
+ value: 78.4045235411144
1979
+ - type: manhattan_pearson
1980
+ value: 83.49981637388689
1981
+ - type: manhattan_spearman
1982
+ value: 78.4042575139372
1983
+ - task:
1984
+ type: STS
1985
+ dataset:
1986
+ type: mteb/sts13-sts
1987
+ name: MTEB STS13
1988
+ config: default
1989
+ split: test
1990
+ revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca
1991
+ metrics:
1992
+ - type: cos_sim_pearson
1993
+ value: 82.50327987379504
1994
+ - type: cos_sim_spearman
1995
+ value: 84.18556767756205
1996
+ - type: euclidean_pearson
1997
+ value: 82.69684424327679
1998
+ - type: euclidean_spearman
1999
+ value: 83.5368106038335
2000
+ - type: manhattan_pearson
2001
+ value: 82.57967581007374
2002
+ - type: manhattan_spearman
2003
+ value: 83.43009053133697
2004
+ - task:
2005
+ type: STS
2006
+ dataset:
2007
+ type: mteb/sts14-sts
2008
+ name: MTEB STS14
2009
+ config: default
2010
+ split: test
2011
+ revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375
2012
+ metrics:
2013
+ - type: cos_sim_pearson
2014
+ value: 82.50756863007814
2015
+ - type: cos_sim_spearman
2016
+ value: 82.27204331279108
2017
+ - type: euclidean_pearson
2018
+ value: 81.39535251429741
2019
+ - type: euclidean_spearman
2020
+ value: 81.84386626336239
2021
+ - type: manhattan_pearson
2022
+ value: 81.34281737280695
2023
+ - type: manhattan_spearman
2024
+ value: 81.81149375673166
2025
+ - task:
2026
+ type: STS
2027
+ dataset:
2028
+ type: mteb/sts15-sts
2029
+ name: MTEB STS15
2030
+ config: default
2031
+ split: test
2032
+ revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3
2033
+ metrics:
2034
+ - type: cos_sim_pearson
2035
+ value: 86.8727714856726
2036
+ - type: cos_sim_spearman
2037
+ value: 87.95738287792312
2038
+ - type: euclidean_pearson
2039
+ value: 86.62920602795887
2040
+ - type: euclidean_spearman
2041
+ value: 87.05207355381243
2042
+ - type: manhattan_pearson
2043
+ value: 86.53587918472225
2044
+ - type: manhattan_spearman
2045
+ value: 86.95382961029586
2046
+ - task:
2047
+ type: STS
2048
+ dataset:
2049
+ type: mteb/sts16-sts
2050
+ name: MTEB STS16
2051
+ config: default
2052
+ split: test
2053
+ revision: 4d8694f8f0e0100860b497b999b3dbed754a0513
2054
+ metrics:
2055
+ - type: cos_sim_pearson
2056
+ value: 83.52240359769479
2057
+ - type: cos_sim_spearman
2058
+ value: 85.47685776238286
2059
+ - type: euclidean_pearson
2060
+ value: 84.25815333483058
2061
+ - type: euclidean_spearman
2062
+ value: 85.27415639683198
2063
+ - type: manhattan_pearson
2064
+ value: 84.29127757025637
2065
+ - type: manhattan_spearman
2066
+ value: 85.30226224917351
2067
+ - task:
2068
+ type: STS
2069
+ dataset:
2070
+ type: mteb/sts17-crosslingual-sts
2071
+ name: MTEB STS17 (en-en)
2072
+ config: en-en
2073
+ split: test
2074
+ revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d
2075
+ metrics:
2076
+ - type: cos_sim_pearson
2077
+ value: 86.42501708915708
2078
+ - type: cos_sim_spearman
2079
+ value: 86.42276182795041
2080
+ - type: euclidean_pearson
2081
+ value: 86.5408207354761
2082
+ - type: euclidean_spearman
2083
+ value: 85.46096321750838
2084
+ - type: manhattan_pearson
2085
+ value: 86.54177303026881
2086
+ - type: manhattan_spearman
2087
+ value: 85.50313151916117
2088
+ - task:
2089
+ type: STS
2090
+ dataset:
2091
+ type: mteb/sts22-crosslingual-sts
2092
+ name: MTEB STS22 (en)
2093
+ config: en
2094
+ split: test
2095
+ revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80
2096
+ metrics:
2097
+ - type: cos_sim_pearson
2098
+ value: 64.86521089250766
2099
+ - type: cos_sim_spearman
2100
+ value: 65.94868540323003
2101
+ - type: euclidean_pearson
2102
+ value: 67.16569626533084
2103
+ - type: euclidean_spearman
2104
+ value: 66.37667004134917
2105
+ - type: manhattan_pearson
2106
+ value: 67.1482365102333
2107
+ - type: manhattan_spearman
2108
+ value: 66.53240122580029
2109
+ - task:
2110
+ type: STS
2111
+ dataset:
2112
+ type: mteb/stsbenchmark-sts
2113
+ name: MTEB STSBenchmark
2114
+ config: default
2115
+ split: test
2116
+ revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831
2117
+ metrics:
2118
+ - type: cos_sim_pearson
2119
+ value: 84.64746265365318
2120
+ - type: cos_sim_spearman
2121
+ value: 86.41888825906786
2122
+ - type: euclidean_pearson
2123
+ value: 85.27453642725811
2124
+ - type: euclidean_spearman
2125
+ value: 85.94095796602544
2126
+ - type: manhattan_pearson
2127
+ value: 85.28643660505334
2128
+ - type: manhattan_spearman
2129
+ value: 85.95028003260744
2130
+ - task:
2131
+ type: Reranking
2132
+ dataset:
2133
+ type: mteb/scidocs-reranking
2134
+ name: MTEB SciDocsRR
2135
+ config: default
2136
+ split: test
2137
+ revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab
2138
+ metrics:
2139
+ - type: map
2140
+ value: 87.48903153618527
2141
+ - type: mrr
2142
+ value: 96.41081503826601
2143
+ - task:
2144
+ type: Retrieval
2145
+ dataset:
2146
+ type: scifact
2147
+ name: MTEB SciFact
2148
+ config: default
2149
+ split: test
2150
+ revision: None
2151
+ metrics:
2152
+ - type: map_at_1
2153
+ value: 58.594
2154
+ - type: map_at_10
2155
+ value: 69.296
2156
+ - type: map_at_100
2157
+ value: 69.782
2158
+ - type: map_at_1000
2159
+ value: 69.795
2160
+ - type: map_at_3
2161
+ value: 66.23
2162
+ - type: map_at_5
2163
+ value: 68.293
2164
+ - type: mrr_at_1
2165
+ value: 61.667
2166
+ - type: mrr_at_10
2167
+ value: 70.339
2168
+ - type: mrr_at_100
2169
+ value: 70.708
2170
+ - type: mrr_at_1000
2171
+ value: 70.722
2172
+ - type: mrr_at_3
2173
+ value: 68.0
2174
+ - type: mrr_at_5
2175
+ value: 69.56700000000001
2176
+ - type: ndcg_at_1
2177
+ value: 61.667
2178
+ - type: ndcg_at_10
2179
+ value: 74.039
2180
+ - type: ndcg_at_100
2181
+ value: 76.103
2182
+ - type: ndcg_at_1000
2183
+ value: 76.47800000000001
2184
+ - type: ndcg_at_3
2185
+ value: 68.967
2186
+ - type: ndcg_at_5
2187
+ value: 71.96900000000001
2188
+ - type: precision_at_1
2189
+ value: 61.667
2190
+ - type: precision_at_10
2191
+ value: 9.866999999999999
2192
+ - type: precision_at_100
2193
+ value: 1.097
2194
+ - type: precision_at_1000
2195
+ value: 0.11299999999999999
2196
+ - type: precision_at_3
2197
+ value: 27.111
2198
+ - type: precision_at_5
2199
+ value: 18.2
2200
+ - type: recall_at_1
2201
+ value: 58.594
2202
+ - type: recall_at_10
2203
+ value: 87.422
2204
+ - type: recall_at_100
2205
+ value: 96.667
2206
+ - type: recall_at_1000
2207
+ value: 99.667
2208
+ - type: recall_at_3
2209
+ value: 74.217
2210
+ - type: recall_at_5
2211
+ value: 81.539
2212
+ - task:
2213
+ type: PairClassification
2214
+ dataset:
2215
+ type: mteb/sprintduplicatequestions-pairclassification
2216
+ name: MTEB SprintDuplicateQuestions
2217
+ config: default
2218
+ split: test
2219
+ revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46
2220
+ metrics:
2221
+ - type: cos_sim_accuracy
2222
+ value: 99.85049504950496
2223
+ - type: cos_sim_ap
2224
+ value: 96.33111544137081
2225
+ - type: cos_sim_f1
2226
+ value: 92.35443037974684
2227
+ - type: cos_sim_precision
2228
+ value: 93.53846153846153
2229
+ - type: cos_sim_recall
2230
+ value: 91.2
2231
+ - type: dot_accuracy
2232
+ value: 99.82376237623762
2233
+ - type: dot_ap
2234
+ value: 95.38082527310888
2235
+ - type: dot_f1
2236
+ value: 90.90909090909092
2237
+ - type: dot_precision
2238
+ value: 92.90187891440502
2239
+ - type: dot_recall
2240
+ value: 89.0
2241
+ - type: euclidean_accuracy
2242
+ value: 99.84851485148515
2243
+ - type: euclidean_ap
2244
+ value: 96.32316003996347
2245
+ - type: euclidean_f1
2246
+ value: 92.2071392659628
2247
+ - type: euclidean_precision
2248
+ value: 92.71991911021233
2249
+ - type: euclidean_recall
2250
+ value: 91.7
2251
+ - type: manhattan_accuracy
2252
+ value: 99.84851485148515
2253
+ - type: manhattan_ap
2254
+ value: 96.3655668249217
2255
+ - type: manhattan_f1
2256
+ value: 92.18356026222895
2257
+ - type: manhattan_precision
2258
+ value: 92.98067141403867
2259
+ - type: manhattan_recall
2260
+ value: 91.4
2261
+ - type: max_accuracy
2262
+ value: 99.85049504950496
2263
+ - type: max_ap
2264
+ value: 96.3655668249217
2265
+ - type: max_f1
2266
+ value: 92.35443037974684
2267
+ - task:
2268
+ type: Clustering
2269
+ dataset:
2270
+ type: mteb/stackexchange-clustering
2271
+ name: MTEB StackExchangeClustering
2272
+ config: default
2273
+ split: test
2274
+ revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259
2275
+ metrics:
2276
+ - type: v_measure
2277
+ value: 65.94861371629051
2278
+ - task:
2279
+ type: Clustering
2280
+ dataset:
2281
+ type: mteb/stackexchange-clustering-p2p
2282
+ name: MTEB StackExchangeClusteringP2P
2283
+ config: default
2284
+ split: test
2285
+ revision: 815ca46b2622cec33ccafc3735d572c266efdb44
2286
+ metrics:
2287
+ - type: v_measure
2288
+ value: 35.009430451385
2289
+ - task:
2290
+ type: Reranking
2291
+ dataset:
2292
+ type: mteb/stackoverflowdupquestions-reranking
2293
+ name: MTEB StackOverflowDupQuestions
2294
+ config: default
2295
+ split: test
2296
+ revision: e185fbe320c72810689fc5848eb6114e1ef5ec69
2297
+ metrics:
2298
+ - type: map
2299
+ value: 54.61164066427969
2300
+ - type: mrr
2301
+ value: 55.49710603938544
2302
+ - task:
2303
+ type: Summarization
2304
+ dataset:
2305
+ type: mteb/summeval
2306
+ name: MTEB SummEval
2307
+ config: default
2308
+ split: test
2309
+ revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c
2310
+ metrics:
2311
+ - type: cos_sim_pearson
2312
+ value: 30.622620124907662
2313
+ - type: cos_sim_spearman
2314
+ value: 31.0678351356163
2315
+ - type: dot_pearson
2316
+ value: 30.863727693306814
2317
+ - type: dot_spearman
2318
+ value: 31.230306567021255
2319
+ - task:
2320
+ type: Retrieval
2321
+ dataset:
2322
+ type: trec-covid
2323
+ name: MTEB TRECCOVID
2324
+ config: default
2325
+ split: test
2326
+ revision: None
2327
+ metrics:
2328
+ - type: map_at_1
2329
+ value: 0.22
2330
+ - type: map_at_10
2331
+ value: 2.011
2332
+ - type: map_at_100
2333
+ value: 10.974
2334
+ - type: map_at_1000
2335
+ value: 25.819
2336
+ - type: map_at_3
2337
+ value: 0.6649999999999999
2338
+ - type: map_at_5
2339
+ value: 1.076
2340
+ - type: mrr_at_1
2341
+ value: 86.0
2342
+ - type: mrr_at_10
2343
+ value: 91.8
2344
+ - type: mrr_at_100
2345
+ value: 91.8
2346
+ - type: mrr_at_1000
2347
+ value: 91.8
2348
+ - type: mrr_at_3
2349
+ value: 91.0
2350
+ - type: mrr_at_5
2351
+ value: 91.8
2352
+ - type: ndcg_at_1
2353
+ value: 82.0
2354
+ - type: ndcg_at_10
2355
+ value: 78.07300000000001
2356
+ - type: ndcg_at_100
2357
+ value: 58.231
2358
+ - type: ndcg_at_1000
2359
+ value: 51.153000000000006
2360
+ - type: ndcg_at_3
2361
+ value: 81.123
2362
+ - type: ndcg_at_5
2363
+ value: 81.059
2364
+ - type: precision_at_1
2365
+ value: 86.0
2366
+ - type: precision_at_10
2367
+ value: 83.0
2368
+ - type: precision_at_100
2369
+ value: 59.38
2370
+ - type: precision_at_1000
2371
+ value: 22.55
2372
+ - type: precision_at_3
2373
+ value: 87.333
2374
+ - type: precision_at_5
2375
+ value: 86.8
2376
+ - type: recall_at_1
2377
+ value: 0.22
2378
+ - type: recall_at_10
2379
+ value: 2.2079999999999997
2380
+ - type: recall_at_100
2381
+ value: 14.069
2382
+ - type: recall_at_1000
2383
+ value: 47.678
2384
+ - type: recall_at_3
2385
+ value: 0.7040000000000001
2386
+ - type: recall_at_5
2387
+ value: 1.161
2388
+ - task:
2389
+ type: Retrieval
2390
+ dataset:
2391
+ type: webis-touche2020
2392
+ name: MTEB Touche2020
2393
+ config: default
2394
+ split: test
2395
+ revision: None
2396
+ metrics:
2397
+ - type: map_at_1
2398
+ value: 2.809
2399
+ - type: map_at_10
2400
+ value: 10.394
2401
+ - type: map_at_100
2402
+ value: 16.598
2403
+ - type: map_at_1000
2404
+ value: 18.142
2405
+ - type: map_at_3
2406
+ value: 5.572
2407
+ - type: map_at_5
2408
+ value: 7.1370000000000005
2409
+ - type: mrr_at_1
2410
+ value: 32.653
2411
+ - type: mrr_at_10
2412
+ value: 46.564
2413
+ - type: mrr_at_100
2414
+ value: 47.469
2415
+ - type: mrr_at_1000
2416
+ value: 47.469
2417
+ - type: mrr_at_3
2418
+ value: 42.177
2419
+ - type: mrr_at_5
2420
+ value: 44.524
2421
+ - type: ndcg_at_1
2422
+ value: 30.612000000000002
2423
+ - type: ndcg_at_10
2424
+ value: 25.701
2425
+ - type: ndcg_at_100
2426
+ value: 37.532
2427
+ - type: ndcg_at_1000
2428
+ value: 48.757
2429
+ - type: ndcg_at_3
2430
+ value: 28.199999999999996
2431
+ - type: ndcg_at_5
2432
+ value: 25.987
2433
+ - type: precision_at_1
2434
+ value: 32.653
2435
+ - type: precision_at_10
2436
+ value: 23.469
2437
+ - type: precision_at_100
2438
+ value: 7.9799999999999995
2439
+ - type: precision_at_1000
2440
+ value: 1.5350000000000001
2441
+ - type: precision_at_3
2442
+ value: 29.932
2443
+ - type: precision_at_5
2444
+ value: 26.122
2445
+ - type: recall_at_1
2446
+ value: 2.809
2447
+ - type: recall_at_10
2448
+ value: 16.887
2449
+ - type: recall_at_100
2450
+ value: 48.67
2451
+ - type: recall_at_1000
2452
+ value: 82.89699999999999
2453
+ - type: recall_at_3
2454
+ value: 6.521000000000001
2455
+ - type: recall_at_5
2456
+ value: 9.609
2457
+ - task:
2458
+ type: Classification
2459
+ dataset:
2460
+ type: mteb/toxic_conversations_50k
2461
+ name: MTEB ToxicConversationsClassification
2462
+ config: default
2463
+ split: test
2464
+ revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c
2465
+ metrics:
2466
+ - type: accuracy
2467
+ value: 71.57860000000001
2468
+ - type: ap
2469
+ value: 13.82629211536393
2470
+ - type: f1
2471
+ value: 54.59860966183956
2472
+ - task:
2473
+ type: Classification
2474
+ dataset:
2475
+ type: mteb/tweet_sentiment_extraction
2476
+ name: MTEB TweetSentimentExtractionClassification
2477
+ config: default
2478
+ split: test
2479
+ revision: d604517c81ca91fe16a244d1248fc021f9ecee7a
2480
+ metrics:
2481
+ - type: accuracy
2482
+ value: 59.38030560271647
2483
+ - type: f1
2484
+ value: 59.69685552567865
2485
+ - task:
2486
+ type: Clustering
2487
+ dataset:
2488
+ type: mteb/twentynewsgroups-clustering
2489
+ name: MTEB TwentyNewsgroupsClustering
2490
+ config: default
2491
+ split: test
2492
+ revision: 6125ec4e24fa026cec8a478383ee943acfbd5449
2493
+ metrics:
2494
+ - type: v_measure
2495
+ value: 51.4736717043405
2496
+ - task:
2497
+ type: PairClassification
2498
+ dataset:
2499
+ type: mteb/twittersemeval2015-pairclassification
2500
+ name: MTEB TwitterSemEval2015
2501
+ config: default
2502
+ split: test
2503
+ revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1
2504
+ metrics:
2505
+ - type: cos_sim_accuracy
2506
+ value: 86.92853311080646
2507
+ - type: cos_sim_ap
2508
+ value: 77.67872502591382
2509
+ - type: cos_sim_f1
2510
+ value: 70.33941236068895
2511
+ - type: cos_sim_precision
2512
+ value: 67.63273258645884
2513
+ - type: cos_sim_recall
2514
+ value: 73.27176781002639
2515
+ - type: dot_accuracy
2516
+ value: 85.79603027954938
2517
+ - type: dot_ap
2518
+ value: 73.73786190233379
2519
+ - type: dot_f1
2520
+ value: 67.3437901774235
2521
+ - type: dot_precision
2522
+ value: 65.67201604814443
2523
+ - type: dot_recall
2524
+ value: 69.10290237467018
2525
+ - type: euclidean_accuracy
2526
+ value: 86.94045419324074
2527
+ - type: euclidean_ap
2528
+ value: 77.6687791535167
2529
+ - type: euclidean_f1
2530
+ value: 70.47209214023542
2531
+ - type: euclidean_precision
2532
+ value: 67.7207492094381
2533
+ - type: euclidean_recall
2534
+ value: 73.45646437994723
2535
+ - type: manhattan_accuracy
2536
+ value: 86.87488823985218
2537
+ - type: manhattan_ap
2538
+ value: 77.63373392430728
2539
+ - type: manhattan_f1
2540
+ value: 70.40920716112532
2541
+ - type: manhattan_precision
2542
+ value: 68.31265508684864
2543
+ - type: manhattan_recall
2544
+ value: 72.63852242744063
2545
+ - type: max_accuracy
2546
+ value: 86.94045419324074
2547
+ - type: max_ap
2548
+ value: 77.67872502591382
2549
+ - type: max_f1
2550
+ value: 70.47209214023542
2551
+ - task:
2552
+ type: PairClassification
2553
+ dataset:
2554
+ type: mteb/twitterurlcorpus-pairclassification
2555
+ name: MTEB TwitterURLCorpus
2556
+ config: default
2557
+ split: test
2558
+ revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf
2559
+ metrics:
2560
+ - type: cos_sim_accuracy
2561
+ value: 88.67155664221679
2562
+ - type: cos_sim_ap
2563
+ value: 85.64591703003417
2564
+ - type: cos_sim_f1
2565
+ value: 77.59531005352656
2566
+ - type: cos_sim_precision
2567
+ value: 73.60967184801382
2568
+ - type: cos_sim_recall
2569
+ value: 82.03726516784724
2570
+ - type: dot_accuracy
2571
+ value: 88.41541506578181
2572
+ - type: dot_ap
2573
+ value: 84.6482788957769
2574
+ - type: dot_f1
2575
+ value: 77.04748541466657
2576
+ - type: dot_precision
2577
+ value: 74.02440754931176
2578
+ - type: dot_recall
2579
+ value: 80.3279950723745
2580
+ - type: euclidean_accuracy
2581
+ value: 88.63080684596576
2582
+ - type: euclidean_ap
2583
+ value: 85.44570045321562
2584
+ - type: euclidean_f1
2585
+ value: 77.28769403336106
2586
+ - type: euclidean_precision
2587
+ value: 72.90600040958427
2588
+ - type: euclidean_recall
2589
+ value: 82.22975053895904
2590
+ - type: manhattan_accuracy
2591
+ value: 88.59393798269105
2592
+ - type: manhattan_ap
2593
+ value: 85.40271361038187
2594
+ - type: manhattan_f1
2595
+ value: 77.17606419344392
2596
+ - type: manhattan_precision
2597
+ value: 72.4447747078295
2598
+ - type: manhattan_recall
2599
+ value: 82.5685247921158
2600
+ - type: max_accuracy
2601
+ value: 88.67155664221679
2602
+ - type: max_ap
2603
+ value: 85.64591703003417
2604
+ - type: max_f1
2605
+ value: 77.59531005352656
2606
+ license: mit
2607
+ language:
2608
+ - en
2609
+ ---
2610
+ # # Fast-Inference with Ctranslate2
2611
+ Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on CPU or GPU.
2612
+
2613
+ quantized version of [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5)
2614
+ ```bash
2615
+ pip install hf-hub-ctranslate2>=2.12.0 ctranslate2>=3.17.1
2616
+ ```
2617
+
2618
+ ```python
2619
+ # from transformers import AutoTokenizer
2620
+ model_name = "michaelfeil/ct2fast-bge-base-en-v1.5"
2621
+ model_name_orig="BAAI/bge-base-en-v1.5"
2622
+
2623
+ from hf_hub_ctranslate2 import EncoderCT2fromHfHub
2624
+ model = EncoderCT2fromHfHub(
2625
+ # load in int8 on CUDA
2626
+ model_name_or_path=model_name,
2627
+ device="cuda",
2628
+ compute_type="int8_float16"
2629
+ )
2630
+ outputs = model.generate(
2631
+ text=["I like soccer", "I like tennis", "The eiffel tower is in Paris"],
2632
+ max_length=64,
2633
+ ) # perform downstream tasks on outputs
2634
+ outputs["pooler_output"]
2635
+ outputs["last_hidden_state"]
2636
+ outputs["attention_mask"]
2637
+
2638
+ # alternative, use SentenceTransformer Mix-In
2639
+ # for end-to-end Sentence embeddings generation
2640
+ # (not pulling from this CT2fast-HF repo)
2641
+
2642
+ from hf_hub_ctranslate2 import CT2SentenceTransformer
2643
+ model = CT2SentenceTransformer(
2644
+ model_name_orig, compute_type="int8_float16", device="cuda"
2645
+ )
2646
+ embeddings = model.encode(
2647
+ ["I like soccer", "I like tennis", "The eiffel tower is in Paris"],
2648
+ batch_size=32,
2649
+ convert_to_numpy=True,
2650
+ normalize_embeddings=True,
2651
+ )
2652
+ print(embeddings.shape, embeddings)
2653
+ scores = (embeddings @ embeddings.T) * 100
2654
+
2655
+ # Hint: you can also host this code via REST API and
2656
+ # via github.com/michaelfeil/infinity
2657
+
2658
+
2659
+ ```
2660
+
2661
+ Checkpoint compatible to [ctranslate2>=3.17.1](https://github.com/OpenNMT/CTranslate2)
2662
+ and [hf-hub-ctranslate2>=2.12.0](https://github.com/michaelfeil/hf-hub-ctranslate2)
2663
+ - `compute_type=int8_float16` for `device="cuda"`
2664
+ - `compute_type=int8` for `device="cpu"`
2665
+
2666
+ Converted on 2023-10-13 using
2667
+ ```
2668
+ LLama-2 -> removed <pad> token.
2669
+ ```
2670
+
2671
+ # Licence and other remarks:
2672
+ This is just a quantized version. Licence conditions are intended to be idential to original huggingface repo.
2673
+
2674
+ # Original description
2675
+
2676
+
2677
+
2678
+ <h1 align="center">FlagEmbedding</h1>
2679
+
2680
+
2681
+ <h4 align="center">
2682
+ <p>
2683
+ <a href=#model-list>Model List</a> |
2684
+ <a href=#frequently-asked-questions>FAQ</a> |
2685
+ <a href=#usage>Usage</a> |
2686
+ <a href="#evaluation">Evaluation</a> |
2687
+ <a href="#train">Train</a> |
2688
+ <a href="#contact">Contact</a> |
2689
+ <a href="#citation">Citation</a> |
2690
+ <a href="#license">License</a>
2691
+ <p>
2692
+ </h4>
2693
+
2694
+ More details please refer to our Github: [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding).
2695
+
2696
+
2697
+ [English](README.md) | [中文](https://github.com/FlagOpen/FlagEmbedding/blob/master/README_zh.md)
2698
+
2699
+ FlagEmbedding can map any text to a low-dimensional dense vector which can be used for tasks like retrieval, classification, clustering, or semantic search.
2700
+ And it also can be used in vector databases for LLMs.
2701
+
2702
+ ************* 🌟**Updates**🌟 *************
2703
+ - 10/12/2023: Release [LLM-Embedder](./FlagEmbedding/llm_embedder/README.md), a unified embedding model to support diverse retrieval augmentation needs for LLMs. [Paper](https://arxiv.org/pdf/2310.07554.pdf) :fire:
2704
+ - 09/15/2023: The [technical report](https://arxiv.org/pdf/2309.07597.pdf) of BGE has been released
2705
+ - 09/15/2023: The [masive training data](https://data.baai.ac.cn/details/BAAI-MTP) of BGE has been released
2706
+ - 09/12/2023: New models:
2707
+ - **New reranker model**: release cross-encoder models `BAAI/bge-reranker-base` and `BAAI/bge-reranker-large`, which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models.
2708
+ - **update embedding model**: release `bge-*-v1.5` embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction.
2709
+
2710
+
2711
+ <details>
2712
+ <summary>More</summary>
2713
+ <!-- ### More -->
2714
+
2715
+ - 09/07/2023: Update [fine-tune code](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/README.md): Add script to mine hard negatives and support adding instruction during fine-tuning.
2716
+ - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like [this](#using-langchain); C-MTEB **leaderboard** is [available](https://huggingface.co/spaces/mteb/leaderboard).
2717
+ - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗**
2718
+ - 08/02/2023: Release `bge-large-*`(short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada:
2719
+ - 08/01/2023: We release the [Chinese Massive Text Embedding Benchmark](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB) (**C-MTEB**), consisting of 31 test dataset.
2720
+
2721
+ </details>
2722
+
2723
+
2724
+ ## Model List
2725
+
2726
+ `bge` is short for `BAAI general embedding`.
2727
+
2728
+ | Model | Language | | Description | query instruction for retrieval [1] |
2729
+ |:-------------------------------|:--------:| :--------:| :--------:|:--------:|
2730
+ | [BAAI/llm-embedder](https://huggingface.co/BAAI/llm-embedder) | English | [Inference](./FlagEmbedding/llm_embedder/README.md) [Fine-tune](./FlagEmbedding/llm_embedder/README.md) | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See [README](./FlagEmbedding/llm_embedder/README.md) |
2731
+ | [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) | Chinese and English | [Inference](#usage-for-reranker) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) | a cross-encoder model which is more accurate but less efficient [2] | |
2732
+ | [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) | Chinese and English | [Inference](#usage-for-reranker) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) | a cross-encoder model which is more accurate but less efficient [2] | |
2733
+ | [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `Represent this sentence for searching relevant passages: ` |
2734
+ | [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `Represent this sentence for searching relevant passages: ` |
2735
+ | [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `Represent this sentence for searching relevant passages: ` |
2736
+ | [BAAI/bge-large-zh-v1.5](https://huggingface.co/BAAI/bge-large-zh-v1.5) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `为这个句子生成表示以用于检索相关文章:` |
2737
+ | [BAAI/bge-base-zh-v1.5](https://huggingface.co/BAAI/bge-base-zh-v1.5) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `为这个句子生成表示以用于检索相关文章:` |
2738
+ | [BAAI/bge-small-zh-v1.5](https://huggingface.co/BAAI/bge-small-zh-v1.5) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `为这个句子生成表示以用于检索相关文章:` |
2739
+ | [BAAI/bge-large-en](https://huggingface.co/BAAI/bge-large-en) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | :trophy: rank **1st** in [MTEB](https://huggingface.co/spaces/mteb/leaderboard) leaderboard | `Represent this sentence for searching relevant passages: ` |
2740
+ | [BAAI/bge-base-en](https://huggingface.co/BAAI/bge-base-en) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | a base-scale model but with similar ability to `bge-large-en` | `Represent this sentence for searching relevant passages: ` |
2741
+ | [BAAI/bge-small-en](https://huggingface.co/BAAI/bge-small-en) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) |a small-scale model but with competitive performance | `Represent this sentence for searching relevant passages: ` |
2742
+ | [BAAI/bge-large-zh](https://huggingface.co/BAAI/bge-large-zh) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | :trophy: rank **1st** in [C-MTEB](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB) benchmark | `为这个句子生成表示以用于检索相关文章:` |
2743
+ | [BAAI/bge-base-zh](https://huggingface.co/BAAI/bge-base-zh) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | a base-scale model but with similar ability to `bge-large-zh` | `为这个句子生成表示以用于检索相关文章:` |
2744
+ | [BAAI/bge-small-zh](https://huggingface.co/BAAI/bge-small-zh) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | a small-scale model but with competitive performance | `为这个句子生成表示以用于检索相关文章:` |
2745
+
2746
+
2747
+ [1\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages.
2748
+
2749
+ [2\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models.
2750
+ For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results.
2751
+
2752
+ All models have been uploaded to Huggingface Hub, and you can see them at https://huggingface.co/BAAI.
2753
+ If you cannot open the Huggingface Hub, you also can download the models at https://model.baai.ac.cn/models .
2754
+
2755
+
2756
+ ## Frequently asked questions
2757
+
2758
+ <details>
2759
+ <summary>1. How to fine-tune bge embedding model?</summary>
2760
+
2761
+ <!-- ### How to fine-tune bge embedding model? -->
2762
+ Following this [example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) to prepare data and fine-tune your model.
2763
+ Some suggestions:
2764
+ - Mine hard negatives following this [example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune#hard-negatives), which can improve the retrieval performance.
2765
+ - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity.
2766
+ - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker.
2767
+
2768
+
2769
+ </details>
2770
+
2771
+ <details>
2772
+ <summary>2. The similarity score between two dissimilar sentences is higher than 0.5</summary>
2773
+
2774
+ <!-- ### The similarity score between two dissimilar sentences is higher than 0.5 -->
2775
+ **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.**
2776
+
2777
+ Since we finetune the models by contrastive learning with a temperature of 0.01,
2778
+ the similarity distribution of the current BGE model is about in the interval \[0.6, 1\].
2779
+ So a similarity score greater than 0.5 does not indicate that the two sentences are similar.
2780
+
2781
+ For downstream tasks, such as passage retrieval or semantic similarity,
2782
+ **what matters is the relative order of the scores, not the absolute value.**
2783
+ If you need to filter similar sentences based on a similarity threshold,
2784
+ please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9).
2785
+
2786
+ </details>
2787
+
2788
+ <details>
2789
+ <summary>3. When does the query instruction need to be used</summary>
2790
+
2791
+ <!-- ### When does the query instruction need to be used -->
2792
+
2793
+ For the `bge-*-v1.5`, we improve its retrieval ability when not using instruction.
2794
+ No instruction only has a slight degradation in retrieval performance compared with using instruction.
2795
+ So you can generate embedding without instruction in all cases for convenience.
2796
+
2797
+ For a retrieval task that uses short queries to find long related documents,
2798
+ it is recommended to add instructions for these short queries.
2799
+ **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.**
2800
+ In all cases, the documents/passages do not need to add the instruction.
2801
+
2802
+ </details>
2803
+
2804
+
2805
+ ## Usage
2806
+
2807
+ ### Usage for Embedding Model
2808
+
2809
+ Here are some examples for using `bge` models with
2810
+ [FlagEmbedding](#using-flagembedding), [Sentence-Transformers](#using-sentence-transformers), [Langchain](#using-langchain), or [Huggingface Transformers](#using-huggingface-transformers).
2811
+
2812
+ #### Using FlagEmbedding
2813
+ ```
2814
+ pip install -U FlagEmbedding
2815
+ ```
2816
+ If it doesn't work for you, you can see [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/README.md) for more methods to install FlagEmbedding.
2817
+
2818
+ ```python
2819
+ from FlagEmbedding import FlagModel
2820
+ sentences_1 = ["样例数据-1", "样例数据-2"]
2821
+ sentences_2 = ["样例数据-3", "样例数据-4"]
2822
+ model = FlagModel('BAAI/bge-large-zh-v1.5',
2823
+ query_instruction_for_retrieval="为这个句子生成表示以用于检索相关文章:",
2824
+ use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation
2825
+ embeddings_1 = model.encode(sentences_1)
2826
+ embeddings_2 = model.encode(sentences_2)
2827
+ similarity = embeddings_1 @ embeddings_2.T
2828
+ print(similarity)
2829
+
2830
+ # for s2p(short query to long passage) retrieval task, suggest to use encode_queries() which will automatically add the instruction to each query
2831
+ # corpus in retrieval task can still use encode() or encode_corpus(), since they don't need instruction
2832
+ queries = ['query_1', 'query_2']
2833
+ passages = ["样例文档-1", "样例文档-2"]
2834
+ q_embeddings = model.encode_queries(queries)
2835
+ p_embeddings = model.encode(passages)
2836
+ scores = q_embeddings @ p_embeddings.T
2837
+ ```
2838
+ For the value of the argument `query_instruction_for_retrieval`, see [Model List](https://github.com/FlagOpen/FlagEmbedding/tree/master#model-list).
2839
+
2840
+ By default, FlagModel will use all available GPUs when encoding. Please set `os.environ["CUDA_VISIBLE_DEVICES"]` to select specific GPUs.
2841
+ You also can set `os.environ["CUDA_VISIBLE_DEVICES"]=""` to make all GPUs unavailable.
2842
+
2843
+
2844
+ #### Using Sentence-Transformers
2845
+
2846
+ You can also use the `bge` models with [sentence-transformers](https://www.SBERT.net):
2847
+
2848
+ ```
2849
+ pip install -U sentence-transformers
2850
+ ```
2851
+ ```python
2852
+ from sentence_transformers import SentenceTransformer
2853
+ sentences_1 = ["样例数据-1", "样例数据-2"]
2854
+ sentences_2 = ["样例数据-3", "样例数据-4"]
2855
+ model = SentenceTransformer('BAAI/bge-large-zh-v1.5')
2856
+ embeddings_1 = model.encode(sentences_1, normalize_embeddings=True)
2857
+ embeddings_2 = model.encode(sentences_2, normalize_embeddings=True)
2858
+ similarity = embeddings_1 @ embeddings_2.T
2859
+ print(similarity)
2860
+ ```
2861
+ For s2p(short query to long passage) retrieval task,
2862
+ each short query should start with an instruction (instructions see [Model List](https://github.com/FlagOpen/FlagEmbedding/tree/master#model-list)).
2863
+ But the instruction is not needed for passages.
2864
+ ```python
2865
+ from sentence_transformers import SentenceTransformer
2866
+ queries = ['query_1', 'query_2']
2867
+ passages = ["样例文档-1", "样例文档-2"]
2868
+ instruction = "为这个句子生成表示以用于检索相关文章:"
2869
+
2870
+ model = SentenceTransformer('BAAI/bge-large-zh-v1.5')
2871
+ q_embeddings = model.encode([instruction+q for q in queries], normalize_embeddings=True)
2872
+ p_embeddings = model.encode(passages, normalize_embeddings=True)
2873
+ scores = q_embeddings @ p_embeddings.T
2874
+ ```
2875
+
2876
+ #### Using Langchain
2877
+
2878
+ You can use `bge` in langchain like this:
2879
+ ```python
2880
+ from langchain.embeddings import HuggingFaceBgeEmbeddings
2881
+ model_name = "BAAI/bge-large-en-v1.5"
2882
+ model_kwargs = {'device': 'cuda'}
2883
+ encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity
2884
+ model = HuggingFaceBgeEmbeddings(
2885
+ model_name=model_name,
2886
+ model_kwargs=model_kwargs,
2887
+ encode_kwargs=encode_kwargs,
2888
+ query_instruction="为这个句子生成表示以用于检索相关文章:"
2889
+ )
2890
+ model.query_instruction = "为这个句子生成表示以用于检索相关文章:"
2891
+ ```
2892
+
2893
+
2894
+ #### Using HuggingFace Transformers
2895
+
2896
+ With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding.
2897
+
2898
+ ```python
2899
+ from transformers import AutoTokenizer, AutoModel
2900
+ import torch
2901
+ # Sentences we want sentence embeddings for
2902
+ sentences = ["样例数据-1", "样例数据-2"]
2903
+
2904
+ # Load model from HuggingFace Hub
2905
+ tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-large-zh-v1.5')
2906
+ model = AutoModel.from_pretrained('BAAI/bge-large-zh-v1.5')
2907
+ model.eval()
2908
+
2909
+ # Tokenize sentences
2910
+ encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
2911
+ # for s2p(short query to long passage) retrieval task, add an instruction to query (not add instruction for passages)
2912
+ # encoded_input = tokenizer([instruction + q for q in queries], padding=True, truncation=True, return_tensors='pt')
2913
+
2914
+ # Compute token embeddings
2915
+ with torch.no_grad():
2916
+ model_output = model(**encoded_input)
2917
+ # Perform pooling. In this case, cls pooling.
2918
+ sentence_embeddings = model_output[0][:, 0]
2919
+ # normalize embeddings
2920
+ sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1)
2921
+ print("Sentence embeddings:", sentence_embeddings)
2922
+ ```
2923
+
2924
+ ### Usage for Reranker
2925
+
2926
+ Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding.
2927
+ You can get a relevance score by inputting query and passage to the reranker.
2928
+ The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range.
2929
+
2930
+
2931
+ #### Using FlagEmbedding
2932
+ ```
2933
+ pip install -U FlagEmbedding
2934
+ ```
2935
+
2936
+ Get relevance scores (higher scores indicate more relevance):
2937
+ ```python
2938
+ from FlagEmbedding import FlagReranker
2939
+ reranker = FlagReranker('BAAI/bge-reranker-large', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation
2940
+
2941
+ score = reranker.compute_score(['query', 'passage'])
2942
+ print(score)
2943
+
2944
+ scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']])
2945
+ print(scores)
2946
+ ```
2947
+
2948
+
2949
+ #### Using Huggingface transformers
2950
+
2951
+ ```python
2952
+ import torch
2953
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
2954
+
2955
+ tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-large')
2956
+ model = AutoModelForSequenceClassification.from_pretrained('BAAI/bge-reranker-large')
2957
+ model.eval()
2958
+
2959
+ pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]
2960
+ with torch.no_grad():
2961
+ inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)
2962
+ scores = model(**inputs, return_dict=True).logits.view(-1, ).float()
2963
+ print(scores)
2964
+ ```
2965
+
2966
+ ## Evaluation
2967
+
2968
+ `baai-general-embedding` models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!**
2969
+ For more details and evaluation tools see our [scripts](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB/README.md).
2970
+
2971
+ - **MTEB**:
2972
+
2973
+ | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) |
2974
+ |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
2975
+ | [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 |
2976
+ | [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 |
2977
+ | [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 |
2978
+ | [bge-large-en](https://huggingface.co/BAAI/bge-large-en) | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 |
2979
+ | [bge-base-en](https://huggingface.co/BAAI/bge-base-en) | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 |
2980
+ | [gte-large](https://huggingface.co/thenlper/gte-large) | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 |
2981
+ | [gte-base](https://huggingface.co/thenlper/gte-base) | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 |
2982
+ | [e5-large-v2](https://huggingface.co/intfloat/e5-large-v2) | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 |
2983
+ | [bge-small-en](https://huggingface.co/BAAI/bge-small-en) | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 |
2984
+ | [instructor-xl](https://huggingface.co/hkunlp/instructor-xl) | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 |
2985
+ | [e5-base-v2](https://huggingface.co/intfloat/e5-base-v2) | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 |
2986
+ | [gte-small](https://huggingface.co/thenlper/gte-small) | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 |
2987
+ | [text-embedding-ada-002](https://platform.openai.com/docs/guides/embeddings) | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 |
2988
+ | [e5-small-v2](https://huggingface.co/intfloat/e5-base-v2) | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 |
2989
+ | [sentence-t5-xxl](https://huggingface.co/sentence-transformers/sentence-t5-xxl) | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 |
2990
+ | [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 |
2991
+ | [sgpt-bloom-7b1-msmarco](https://huggingface.co/bigscience/sgpt-bloom-7b1-msmarco) | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 |
2992
+
2993
+
2994
+
2995
+ - **C-MTEB**:
2996
+ We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks.
2997
+ Please refer to [C_MTEB](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB/README.md) for a detailed introduction.
2998
+
2999
+ | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering |
3000
+ |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|
3001
+ | [**BAAI/bge-large-zh-v1.5**](https://huggingface.co/BAAI/bge-large-zh-v1.5) | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 |
3002
+ | [BAAI/bge-base-zh-v1.5](https://huggingface.co/BAAI/bge-base-zh-v1.5) | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 |
3003
+ | [BAAI/bge-small-zh-v1.5](https://huggingface.co/BAAI/bge-small-zh-v1.5) | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 |
3004
+ | [BAAI/bge-large-zh](https://huggingface.co/BAAI/bge-large-zh) | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 |
3005
+ | [bge-large-zh-noinstruct](https://huggingface.co/BAAI/bge-large-zh-noinstruct) | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 |
3006
+ | [BAAI/bge-base-zh](https://huggingface.co/BAAI/bge-base-zh) | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 |
3007
+ | [multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large) | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 |
3008
+ | [BAAI/bge-small-zh](https://huggingface.co/BAAI/bge-small-zh) | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 |
3009
+ | [m3e-base](https://huggingface.co/moka-ai/m3e-base) | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 |
3010
+ | [m3e-large](https://huggingface.co/moka-ai/m3e-large) | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 |
3011
+ | [multilingual-e5-base](https://huggingface.co/intfloat/multilingual-e5-base) | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 |
3012
+ | [multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small) | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 |
3013
+ | [text-embedding-ada-002(OpenAI)](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 |
3014
+ | [luotuo](https://huggingface.co/silk-road/luotuo-bert-medium) | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 |
3015
+ | [text2vec-base](https://huggingface.co/shibing624/text2vec-base-chinese) | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 |
3016
+ | [text2vec-large](https://huggingface.co/GanymedeNil/text2vec-large-chinese) | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 |
3017
+
3018
+
3019
+ - **Reranking**:
3020
+ See [C_MTEB](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB/) for evaluation script.
3021
+
3022
+ | Model | T2Reranking | T2RerankingZh2En\* | T2RerankingEn2Zh\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg |
3023
+ |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|
3024
+ | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 |
3025
+ | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 |
3026
+ | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 |
3027
+ | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 |
3028
+ | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 |
3029
+ | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 |
3030
+ | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 |
3031
+ | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 |
3032
+ | [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 |
3033
+ | [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 |
3034
+
3035
+ \* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks
3036
+
3037
+ ## Train
3038
+
3039
+ ### BAAI Embedding
3040
+
3041
+ We pre-train the models using [retromae](https://github.com/staoxiao/RetroMAE) and train them on large-scale pairs data using contrastive learning.
3042
+ **You can fine-tune the embedding model on your data following our [examples](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune).**
3043
+ We also provide a [pre-train example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/pretrain).
3044
+ Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned.
3045
+ More training details for bge see [baai_general_embedding](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/README.md).
3046
+
3047
+
3048
+
3049
+ ### BGE Reranker
3050
+
3051
+ Cross-encoder will perform full-attention over the input pair,
3052
+ which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model.
3053
+ Therefore, it can be used to re-rank the top-k documents returned by embedding model.
3054
+ We train the cross-encoder on a multilingual pair data,
3055
+ The data format is the same as embedding model, so you can fine-tune it easily following our [example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker).
3056
+ More details please refer to [./FlagEmbedding/reranker/README.md](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/reranker)
3057
+
3058
+
3059
+ ## Contact
3060
+ If you have any question or suggestion related to this project, feel free to open an issue or pull request.
3061
+ You also can email Shitao Xiao(stxiao@baai.ac.cn) and Zheng Liu(liuzheng@baai.ac.cn).
3062
+
3063
+
3064
+ ## Citation
3065
+
3066
+ If you find this repository useful, please consider giving a star :star: and citation
3067
+
3068
+ ```
3069
+ @misc{bge_embedding,
3070
+ title={C-Pack: Packaged Resources To Advance General Chinese Embedding},
3071
+ author={Shitao Xiao and Zheng Liu and Peitian Zhang and Niklas Muennighoff},
3072
+ year={2023},
3073
+ eprint={2309.07597},
3074
+ archivePrefix={arXiv},
3075
+ primaryClass={cs.CL}
3076
+ }
3077
+ ```
3078
+
3079
+ ## License
3080
+ FlagEmbedding is licensed under the [MIT License](https://github.com/FlagOpen/FlagEmbedding/blob/master/LICENSE). The released models can be used for commercial purposes free of charge.
3081
+
config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/root/.cache/torch/sentence_transformers/BAAI_bge-base-en/",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.30.0",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522,
32
+ "bos_token": "<s>",
33
+ "eos_token": "</s>",
34
+ "layer_norm_epsilon": 1e-12,
35
+ "unk_token": "[UNK]"
36
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "2.2.2",
4
+ "transformers": "4.28.1",
5
+ "pytorch": "1.13.0+cu117"
6
+ }
7
+ }
model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:72604c1988ca1e4829e71d6217470b8e7bdd80cefe498f48f942029c78f5a973
3
+ size 218972844
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "clean_up_tokenization_spaces": true,
3
+ "cls_token": "[CLS]",
4
+ "do_basic_tokenize": true,
5
+ "do_lower_case": true,
6
+ "mask_token": "[MASK]",
7
+ "model_max_length": 512,
8
+ "never_split": null,
9
+ "pad_token": "[PAD]",
10
+ "sep_token": "[SEP]",
11
+ "strip_accents": null,
12
+ "tokenize_chinese_chars": true,
13
+ "tokenizer_class": "BertTokenizer",
14
+ "unk_token": "[UNK]"
15
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
vocabulary.json ADDED
The diff for this file is too large to render. See raw diff
 
vocabulary.txt ADDED
The diff for this file is too large to render. See raw diff