michaelfeil commited on
Commit
55d4ae0
·
1 Parent(s): e77dee5

Upload BAAI/bge-small-en-v1.5 ctranslate2 weights

Browse files
README.md ADDED
@@ -0,0 +1,3081 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - ctranslate2
4
+ - int8
5
+ - float16
6
+ - sentence-transformers
7
+ - feature-extraction
8
+ - sentence-similarity
9
+ - transformers
10
+ - mteb
11
+ model-index:
12
+ - name: bge-small-en-v1.5
13
+ results:
14
+ - task:
15
+ type: Classification
16
+ dataset:
17
+ type: mteb/amazon_counterfactual
18
+ name: MTEB AmazonCounterfactualClassification (en)
19
+ config: en
20
+ split: test
21
+ revision: e8379541af4e31359cca9fbcf4b00f2671dba205
22
+ metrics:
23
+ - type: accuracy
24
+ value: 73.79104477611939
25
+ - type: ap
26
+ value: 37.21923821573361
27
+ - type: f1
28
+ value: 68.0914945617093
29
+ - task:
30
+ type: Classification
31
+ dataset:
32
+ type: mteb/amazon_polarity
33
+ name: MTEB AmazonPolarityClassification
34
+ config: default
35
+ split: test
36
+ revision: e2d317d38cd51312af73b3d32a06d1a08b442046
37
+ metrics:
38
+ - type: accuracy
39
+ value: 92.75377499999999
40
+ - type: ap
41
+ value: 89.46766124546022
42
+ - type: f1
43
+ value: 92.73884001331487
44
+ - task:
45
+ type: Classification
46
+ dataset:
47
+ type: mteb/amazon_reviews_multi
48
+ name: MTEB AmazonReviewsClassification (en)
49
+ config: en
50
+ split: test
51
+ revision: 1399c76144fd37290681b995c656ef9b2e06e26d
52
+ metrics:
53
+ - type: accuracy
54
+ value: 46.986
55
+ - type: f1
56
+ value: 46.55936786727896
57
+ - task:
58
+ type: Retrieval
59
+ dataset:
60
+ type: arguana
61
+ name: MTEB ArguAna
62
+ config: default
63
+ split: test
64
+ revision: None
65
+ metrics:
66
+ - type: map_at_1
67
+ value: 35.846000000000004
68
+ - type: map_at_10
69
+ value: 51.388
70
+ - type: map_at_100
71
+ value: 52.132999999999996
72
+ - type: map_at_1000
73
+ value: 52.141000000000005
74
+ - type: map_at_3
75
+ value: 47.037
76
+ - type: map_at_5
77
+ value: 49.579
78
+ - type: mrr_at_1
79
+ value: 36.558
80
+ - type: mrr_at_10
81
+ value: 51.658
82
+ - type: mrr_at_100
83
+ value: 52.402
84
+ - type: mrr_at_1000
85
+ value: 52.410000000000004
86
+ - type: mrr_at_3
87
+ value: 47.345
88
+ - type: mrr_at_5
89
+ value: 49.797999999999995
90
+ - type: ndcg_at_1
91
+ value: 35.846000000000004
92
+ - type: ndcg_at_10
93
+ value: 59.550000000000004
94
+ - type: ndcg_at_100
95
+ value: 62.596
96
+ - type: ndcg_at_1000
97
+ value: 62.759
98
+ - type: ndcg_at_3
99
+ value: 50.666999999999994
100
+ - type: ndcg_at_5
101
+ value: 55.228
102
+ - type: precision_at_1
103
+ value: 35.846000000000004
104
+ - type: precision_at_10
105
+ value: 8.542
106
+ - type: precision_at_100
107
+ value: 0.984
108
+ - type: precision_at_1000
109
+ value: 0.1
110
+ - type: precision_at_3
111
+ value: 20.389
112
+ - type: precision_at_5
113
+ value: 14.438
114
+ - type: recall_at_1
115
+ value: 35.846000000000004
116
+ - type: recall_at_10
117
+ value: 85.42
118
+ - type: recall_at_100
119
+ value: 98.43499999999999
120
+ - type: recall_at_1000
121
+ value: 99.644
122
+ - type: recall_at_3
123
+ value: 61.166
124
+ - type: recall_at_5
125
+ value: 72.191
126
+ - task:
127
+ type: Clustering
128
+ dataset:
129
+ type: mteb/arxiv-clustering-p2p
130
+ name: MTEB ArxivClusteringP2P
131
+ config: default
132
+ split: test
133
+ revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d
134
+ metrics:
135
+ - type: v_measure
136
+ value: 47.402770198163594
137
+ - task:
138
+ type: Clustering
139
+ dataset:
140
+ type: mteb/arxiv-clustering-s2s
141
+ name: MTEB ArxivClusteringS2S
142
+ config: default
143
+ split: test
144
+ revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53
145
+ metrics:
146
+ - type: v_measure
147
+ value: 40.01545436974177
148
+ - task:
149
+ type: Reranking
150
+ dataset:
151
+ type: mteb/askubuntudupquestions-reranking
152
+ name: MTEB AskUbuntuDupQuestions
153
+ config: default
154
+ split: test
155
+ revision: 2000358ca161889fa9c082cb41daa8dcfb161a54
156
+ metrics:
157
+ - type: map
158
+ value: 62.586465273207196
159
+ - type: mrr
160
+ value: 74.42169019038825
161
+ - task:
162
+ type: STS
163
+ dataset:
164
+ type: mteb/biosses-sts
165
+ name: MTEB BIOSSES
166
+ config: default
167
+ split: test
168
+ revision: d3fb88f8f02e40887cd149695127462bbcf29b4a
169
+ metrics:
170
+ - type: cos_sim_pearson
171
+ value: 85.1891186537969
172
+ - type: cos_sim_spearman
173
+ value: 83.75492046087288
174
+ - type: euclidean_pearson
175
+ value: 84.11766204805357
176
+ - type: euclidean_spearman
177
+ value: 84.01456493126516
178
+ - type: manhattan_pearson
179
+ value: 84.2132950502772
180
+ - type: manhattan_spearman
181
+ value: 83.89227298813377
182
+ - task:
183
+ type: Classification
184
+ dataset:
185
+ type: mteb/banking77
186
+ name: MTEB Banking77Classification
187
+ config: default
188
+ split: test
189
+ revision: 0fd18e25b25c072e09e0d92ab615fda904d66300
190
+ metrics:
191
+ - type: accuracy
192
+ value: 85.74025974025975
193
+ - type: f1
194
+ value: 85.71493566466381
195
+ - task:
196
+ type: Clustering
197
+ dataset:
198
+ type: mteb/biorxiv-clustering-p2p
199
+ name: MTEB BiorxivClusteringP2P
200
+ config: default
201
+ split: test
202
+ revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40
203
+ metrics:
204
+ - type: v_measure
205
+ value: 38.467181385006434
206
+ - task:
207
+ type: Clustering
208
+ dataset:
209
+ type: mteb/biorxiv-clustering-s2s
210
+ name: MTEB BiorxivClusteringS2S
211
+ config: default
212
+ split: test
213
+ revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908
214
+ metrics:
215
+ - type: v_measure
216
+ value: 34.719496037339056
217
+ - task:
218
+ type: Retrieval
219
+ dataset:
220
+ type: BeIR/cqadupstack
221
+ name: MTEB CQADupstackAndroidRetrieval
222
+ config: default
223
+ split: test
224
+ revision: None
225
+ metrics:
226
+ - type: map_at_1
227
+ value: 29.587000000000003
228
+ - type: map_at_10
229
+ value: 41.114
230
+ - type: map_at_100
231
+ value: 42.532
232
+ - type: map_at_1000
233
+ value: 42.661
234
+ - type: map_at_3
235
+ value: 37.483
236
+ - type: map_at_5
237
+ value: 39.652
238
+ - type: mrr_at_1
239
+ value: 36.338
240
+ - type: mrr_at_10
241
+ value: 46.763
242
+ - type: mrr_at_100
243
+ value: 47.393
244
+ - type: mrr_at_1000
245
+ value: 47.445
246
+ - type: mrr_at_3
247
+ value: 43.538
248
+ - type: mrr_at_5
249
+ value: 45.556000000000004
250
+ - type: ndcg_at_1
251
+ value: 36.338
252
+ - type: ndcg_at_10
253
+ value: 47.658
254
+ - type: ndcg_at_100
255
+ value: 52.824000000000005
256
+ - type: ndcg_at_1000
257
+ value: 54.913999999999994
258
+ - type: ndcg_at_3
259
+ value: 41.989
260
+ - type: ndcg_at_5
261
+ value: 44.944
262
+ - type: precision_at_1
263
+ value: 36.338
264
+ - type: precision_at_10
265
+ value: 9.156
266
+ - type: precision_at_100
267
+ value: 1.4789999999999999
268
+ - type: precision_at_1000
269
+ value: 0.196
270
+ - type: precision_at_3
271
+ value: 20.076
272
+ - type: precision_at_5
273
+ value: 14.85
274
+ - type: recall_at_1
275
+ value: 29.587000000000003
276
+ - type: recall_at_10
277
+ value: 60.746
278
+ - type: recall_at_100
279
+ value: 82.157
280
+ - type: recall_at_1000
281
+ value: 95.645
282
+ - type: recall_at_3
283
+ value: 44.821
284
+ - type: recall_at_5
285
+ value: 52.819
286
+ - task:
287
+ type: Retrieval
288
+ dataset:
289
+ type: BeIR/cqadupstack
290
+ name: MTEB CQADupstackEnglishRetrieval
291
+ config: default
292
+ split: test
293
+ revision: None
294
+ metrics:
295
+ - type: map_at_1
296
+ value: 30.239
297
+ - type: map_at_10
298
+ value: 39.989000000000004
299
+ - type: map_at_100
300
+ value: 41.196
301
+ - type: map_at_1000
302
+ value: 41.325
303
+ - type: map_at_3
304
+ value: 37.261
305
+ - type: map_at_5
306
+ value: 38.833
307
+ - type: mrr_at_1
308
+ value: 37.516
309
+ - type: mrr_at_10
310
+ value: 46.177
311
+ - type: mrr_at_100
312
+ value: 46.806
313
+ - type: mrr_at_1000
314
+ value: 46.849000000000004
315
+ - type: mrr_at_3
316
+ value: 44.002
317
+ - type: mrr_at_5
318
+ value: 45.34
319
+ - type: ndcg_at_1
320
+ value: 37.516
321
+ - type: ndcg_at_10
322
+ value: 45.586
323
+ - type: ndcg_at_100
324
+ value: 49.897000000000006
325
+ - type: ndcg_at_1000
326
+ value: 51.955
327
+ - type: ndcg_at_3
328
+ value: 41.684
329
+ - type: ndcg_at_5
330
+ value: 43.617
331
+ - type: precision_at_1
332
+ value: 37.516
333
+ - type: precision_at_10
334
+ value: 8.522
335
+ - type: precision_at_100
336
+ value: 1.374
337
+ - type: precision_at_1000
338
+ value: 0.184
339
+ - type: precision_at_3
340
+ value: 20.105999999999998
341
+ - type: precision_at_5
342
+ value: 14.152999999999999
343
+ - type: recall_at_1
344
+ value: 30.239
345
+ - type: recall_at_10
346
+ value: 55.03
347
+ - type: recall_at_100
348
+ value: 73.375
349
+ - type: recall_at_1000
350
+ value: 86.29599999999999
351
+ - type: recall_at_3
352
+ value: 43.269000000000005
353
+ - type: recall_at_5
354
+ value: 48.878
355
+ - task:
356
+ type: Retrieval
357
+ dataset:
358
+ type: BeIR/cqadupstack
359
+ name: MTEB CQADupstackGamingRetrieval
360
+ config: default
361
+ split: test
362
+ revision: None
363
+ metrics:
364
+ - type: map_at_1
365
+ value: 38.338
366
+ - type: map_at_10
367
+ value: 50.468999999999994
368
+ - type: map_at_100
369
+ value: 51.553000000000004
370
+ - type: map_at_1000
371
+ value: 51.608
372
+ - type: map_at_3
373
+ value: 47.107
374
+ - type: map_at_5
375
+ value: 49.101
376
+ - type: mrr_at_1
377
+ value: 44.201
378
+ - type: mrr_at_10
379
+ value: 54.057
380
+ - type: mrr_at_100
381
+ value: 54.764
382
+ - type: mrr_at_1000
383
+ value: 54.791000000000004
384
+ - type: mrr_at_3
385
+ value: 51.56699999999999
386
+ - type: mrr_at_5
387
+ value: 53.05
388
+ - type: ndcg_at_1
389
+ value: 44.201
390
+ - type: ndcg_at_10
391
+ value: 56.379000000000005
392
+ - type: ndcg_at_100
393
+ value: 60.645
394
+ - type: ndcg_at_1000
395
+ value: 61.73499999999999
396
+ - type: ndcg_at_3
397
+ value: 50.726000000000006
398
+ - type: ndcg_at_5
399
+ value: 53.58500000000001
400
+ - type: precision_at_1
401
+ value: 44.201
402
+ - type: precision_at_10
403
+ value: 9.141
404
+ - type: precision_at_100
405
+ value: 1.216
406
+ - type: precision_at_1000
407
+ value: 0.135
408
+ - type: precision_at_3
409
+ value: 22.654
410
+ - type: precision_at_5
411
+ value: 15.723999999999998
412
+ - type: recall_at_1
413
+ value: 38.338
414
+ - type: recall_at_10
415
+ value: 70.30499999999999
416
+ - type: recall_at_100
417
+ value: 88.77199999999999
418
+ - type: recall_at_1000
419
+ value: 96.49799999999999
420
+ - type: recall_at_3
421
+ value: 55.218
422
+ - type: recall_at_5
423
+ value: 62.104000000000006
424
+ - task:
425
+ type: Retrieval
426
+ dataset:
427
+ type: BeIR/cqadupstack
428
+ name: MTEB CQADupstackGisRetrieval
429
+ config: default
430
+ split: test
431
+ revision: None
432
+ metrics:
433
+ - type: map_at_1
434
+ value: 25.682
435
+ - type: map_at_10
436
+ value: 33.498
437
+ - type: map_at_100
438
+ value: 34.461000000000006
439
+ - type: map_at_1000
440
+ value: 34.544000000000004
441
+ - type: map_at_3
442
+ value: 30.503999999999998
443
+ - type: map_at_5
444
+ value: 32.216
445
+ - type: mrr_at_1
446
+ value: 27.683999999999997
447
+ - type: mrr_at_10
448
+ value: 35.467999999999996
449
+ - type: mrr_at_100
450
+ value: 36.32
451
+ - type: mrr_at_1000
452
+ value: 36.386
453
+ - type: mrr_at_3
454
+ value: 32.618
455
+ - type: mrr_at_5
456
+ value: 34.262
457
+ - type: ndcg_at_1
458
+ value: 27.683999999999997
459
+ - type: ndcg_at_10
460
+ value: 38.378
461
+ - type: ndcg_at_100
462
+ value: 43.288
463
+ - type: ndcg_at_1000
464
+ value: 45.413
465
+ - type: ndcg_at_3
466
+ value: 32.586
467
+ - type: ndcg_at_5
468
+ value: 35.499
469
+ - type: precision_at_1
470
+ value: 27.683999999999997
471
+ - type: precision_at_10
472
+ value: 5.864
473
+ - type: precision_at_100
474
+ value: 0.882
475
+ - type: precision_at_1000
476
+ value: 0.11
477
+ - type: precision_at_3
478
+ value: 13.446
479
+ - type: precision_at_5
480
+ value: 9.718
481
+ - type: recall_at_1
482
+ value: 25.682
483
+ - type: recall_at_10
484
+ value: 51.712
485
+ - type: recall_at_100
486
+ value: 74.446
487
+ - type: recall_at_1000
488
+ value: 90.472
489
+ - type: recall_at_3
490
+ value: 36.236000000000004
491
+ - type: recall_at_5
492
+ value: 43.234
493
+ - task:
494
+ type: Retrieval
495
+ dataset:
496
+ type: BeIR/cqadupstack
497
+ name: MTEB CQADupstackMathematicaRetrieval
498
+ config: default
499
+ split: test
500
+ revision: None
501
+ metrics:
502
+ - type: map_at_1
503
+ value: 16.073999999999998
504
+ - type: map_at_10
505
+ value: 24.352999999999998
506
+ - type: map_at_100
507
+ value: 25.438
508
+ - type: map_at_1000
509
+ value: 25.545
510
+ - type: map_at_3
511
+ value: 21.614
512
+ - type: map_at_5
513
+ value: 23.104
514
+ - type: mrr_at_1
515
+ value: 19.776
516
+ - type: mrr_at_10
517
+ value: 28.837000000000003
518
+ - type: mrr_at_100
519
+ value: 29.755
520
+ - type: mrr_at_1000
521
+ value: 29.817
522
+ - type: mrr_at_3
523
+ value: 26.201999999999998
524
+ - type: mrr_at_5
525
+ value: 27.714
526
+ - type: ndcg_at_1
527
+ value: 19.776
528
+ - type: ndcg_at_10
529
+ value: 29.701
530
+ - type: ndcg_at_100
531
+ value: 35.307
532
+ - type: ndcg_at_1000
533
+ value: 37.942
534
+ - type: ndcg_at_3
535
+ value: 24.764
536
+ - type: ndcg_at_5
537
+ value: 27.025
538
+ - type: precision_at_1
539
+ value: 19.776
540
+ - type: precision_at_10
541
+ value: 5.659
542
+ - type: precision_at_100
543
+ value: 0.971
544
+ - type: precision_at_1000
545
+ value: 0.133
546
+ - type: precision_at_3
547
+ value: 12.065
548
+ - type: precision_at_5
549
+ value: 8.905000000000001
550
+ - type: recall_at_1
551
+ value: 16.073999999999998
552
+ - type: recall_at_10
553
+ value: 41.647
554
+ - type: recall_at_100
555
+ value: 66.884
556
+ - type: recall_at_1000
557
+ value: 85.91499999999999
558
+ - type: recall_at_3
559
+ value: 27.916
560
+ - type: recall_at_5
561
+ value: 33.729
562
+ - task:
563
+ type: Retrieval
564
+ dataset:
565
+ type: BeIR/cqadupstack
566
+ name: MTEB CQADupstackPhysicsRetrieval
567
+ config: default
568
+ split: test
569
+ revision: None
570
+ metrics:
571
+ - type: map_at_1
572
+ value: 28.444999999999997
573
+ - type: map_at_10
574
+ value: 38.218999999999994
575
+ - type: map_at_100
576
+ value: 39.595
577
+ - type: map_at_1000
578
+ value: 39.709
579
+ - type: map_at_3
580
+ value: 35.586
581
+ - type: map_at_5
582
+ value: 36.895
583
+ - type: mrr_at_1
584
+ value: 34.841
585
+ - type: mrr_at_10
586
+ value: 44.106
587
+ - type: mrr_at_100
588
+ value: 44.98
589
+ - type: mrr_at_1000
590
+ value: 45.03
591
+ - type: mrr_at_3
592
+ value: 41.979
593
+ - type: mrr_at_5
594
+ value: 43.047999999999995
595
+ - type: ndcg_at_1
596
+ value: 34.841
597
+ - type: ndcg_at_10
598
+ value: 43.922
599
+ - type: ndcg_at_100
600
+ value: 49.504999999999995
601
+ - type: ndcg_at_1000
602
+ value: 51.675000000000004
603
+ - type: ndcg_at_3
604
+ value: 39.858
605
+ - type: ndcg_at_5
606
+ value: 41.408
607
+ - type: precision_at_1
608
+ value: 34.841
609
+ - type: precision_at_10
610
+ value: 7.872999999999999
611
+ - type: precision_at_100
612
+ value: 1.2449999999999999
613
+ - type: precision_at_1000
614
+ value: 0.161
615
+ - type: precision_at_3
616
+ value: 18.993
617
+ - type: precision_at_5
618
+ value: 13.032
619
+ - type: recall_at_1
620
+ value: 28.444999999999997
621
+ - type: recall_at_10
622
+ value: 54.984
623
+ - type: recall_at_100
624
+ value: 78.342
625
+ - type: recall_at_1000
626
+ value: 92.77
627
+ - type: recall_at_3
628
+ value: 42.842999999999996
629
+ - type: recall_at_5
630
+ value: 47.247
631
+ - task:
632
+ type: Retrieval
633
+ dataset:
634
+ type: BeIR/cqadupstack
635
+ name: MTEB CQADupstackProgrammersRetrieval
636
+ config: default
637
+ split: test
638
+ revision: None
639
+ metrics:
640
+ - type: map_at_1
641
+ value: 23.072
642
+ - type: map_at_10
643
+ value: 32.354
644
+ - type: map_at_100
645
+ value: 33.800000000000004
646
+ - type: map_at_1000
647
+ value: 33.908
648
+ - type: map_at_3
649
+ value: 29.232000000000003
650
+ - type: map_at_5
651
+ value: 31.049
652
+ - type: mrr_at_1
653
+ value: 29.110000000000003
654
+ - type: mrr_at_10
655
+ value: 38.03
656
+ - type: mrr_at_100
657
+ value: 39.032
658
+ - type: mrr_at_1000
659
+ value: 39.086999999999996
660
+ - type: mrr_at_3
661
+ value: 35.407
662
+ - type: mrr_at_5
663
+ value: 36.76
664
+ - type: ndcg_at_1
665
+ value: 29.110000000000003
666
+ - type: ndcg_at_10
667
+ value: 38.231
668
+ - type: ndcg_at_100
669
+ value: 44.425
670
+ - type: ndcg_at_1000
671
+ value: 46.771
672
+ - type: ndcg_at_3
673
+ value: 33.095
674
+ - type: ndcg_at_5
675
+ value: 35.459
676
+ - type: precision_at_1
677
+ value: 29.110000000000003
678
+ - type: precision_at_10
679
+ value: 7.215000000000001
680
+ - type: precision_at_100
681
+ value: 1.2109999999999999
682
+ - type: precision_at_1000
683
+ value: 0.157
684
+ - type: precision_at_3
685
+ value: 16.058
686
+ - type: precision_at_5
687
+ value: 11.644
688
+ - type: recall_at_1
689
+ value: 23.072
690
+ - type: recall_at_10
691
+ value: 50.285999999999994
692
+ - type: recall_at_100
693
+ value: 76.596
694
+ - type: recall_at_1000
695
+ value: 92.861
696
+ - type: recall_at_3
697
+ value: 35.702
698
+ - type: recall_at_5
699
+ value: 42.152
700
+ - task:
701
+ type: Retrieval
702
+ dataset:
703
+ type: BeIR/cqadupstack
704
+ name: MTEB CQADupstackRetrieval
705
+ config: default
706
+ split: test
707
+ revision: None
708
+ metrics:
709
+ - type: map_at_1
710
+ value: 24.937916666666666
711
+ - type: map_at_10
712
+ value: 33.755250000000004
713
+ - type: map_at_100
714
+ value: 34.955999999999996
715
+ - type: map_at_1000
716
+ value: 35.070499999999996
717
+ - type: map_at_3
718
+ value: 30.98708333333333
719
+ - type: map_at_5
720
+ value: 32.51491666666666
721
+ - type: mrr_at_1
722
+ value: 29.48708333333333
723
+ - type: mrr_at_10
724
+ value: 37.92183333333334
725
+ - type: mrr_at_100
726
+ value: 38.76583333333333
727
+ - type: mrr_at_1000
728
+ value: 38.82466666666667
729
+ - type: mrr_at_3
730
+ value: 35.45125
731
+ - type: mrr_at_5
732
+ value: 36.827000000000005
733
+ - type: ndcg_at_1
734
+ value: 29.48708333333333
735
+ - type: ndcg_at_10
736
+ value: 39.05225
737
+ - type: ndcg_at_100
738
+ value: 44.25983333333334
739
+ - type: ndcg_at_1000
740
+ value: 46.568333333333335
741
+ - type: ndcg_at_3
742
+ value: 34.271583333333325
743
+ - type: ndcg_at_5
744
+ value: 36.483916666666666
745
+ - type: precision_at_1
746
+ value: 29.48708333333333
747
+ - type: precision_at_10
748
+ value: 6.865749999999999
749
+ - type: precision_at_100
750
+ value: 1.1195833333333332
751
+ - type: precision_at_1000
752
+ value: 0.15058333333333335
753
+ - type: precision_at_3
754
+ value: 15.742083333333333
755
+ - type: precision_at_5
756
+ value: 11.221916666666667
757
+ - type: recall_at_1
758
+ value: 24.937916666666666
759
+ - type: recall_at_10
760
+ value: 50.650416666666665
761
+ - type: recall_at_100
762
+ value: 73.55383333333334
763
+ - type: recall_at_1000
764
+ value: 89.61691666666667
765
+ - type: recall_at_3
766
+ value: 37.27808333333334
767
+ - type: recall_at_5
768
+ value: 42.99475
769
+ - task:
770
+ type: Retrieval
771
+ dataset:
772
+ type: BeIR/cqadupstack
773
+ name: MTEB CQADupstackStatsRetrieval
774
+ config: default
775
+ split: test
776
+ revision: None
777
+ metrics:
778
+ - type: map_at_1
779
+ value: 23.947
780
+ - type: map_at_10
781
+ value: 30.575000000000003
782
+ - type: map_at_100
783
+ value: 31.465
784
+ - type: map_at_1000
785
+ value: 31.558000000000003
786
+ - type: map_at_3
787
+ value: 28.814
788
+ - type: map_at_5
789
+ value: 29.738999999999997
790
+ - type: mrr_at_1
791
+ value: 26.994
792
+ - type: mrr_at_10
793
+ value: 33.415
794
+ - type: mrr_at_100
795
+ value: 34.18
796
+ - type: mrr_at_1000
797
+ value: 34.245
798
+ - type: mrr_at_3
799
+ value: 31.621
800
+ - type: mrr_at_5
801
+ value: 32.549
802
+ - type: ndcg_at_1
803
+ value: 26.994
804
+ - type: ndcg_at_10
805
+ value: 34.482
806
+ - type: ndcg_at_100
807
+ value: 38.915
808
+ - type: ndcg_at_1000
809
+ value: 41.355
810
+ - type: ndcg_at_3
811
+ value: 31.139
812
+ - type: ndcg_at_5
813
+ value: 32.589
814
+ - type: precision_at_1
815
+ value: 26.994
816
+ - type: precision_at_10
817
+ value: 5.322
818
+ - type: precision_at_100
819
+ value: 0.8160000000000001
820
+ - type: precision_at_1000
821
+ value: 0.11100000000000002
822
+ - type: precision_at_3
823
+ value: 13.344000000000001
824
+ - type: precision_at_5
825
+ value: 8.988
826
+ - type: recall_at_1
827
+ value: 23.947
828
+ - type: recall_at_10
829
+ value: 43.647999999999996
830
+ - type: recall_at_100
831
+ value: 63.851
832
+ - type: recall_at_1000
833
+ value: 82.0
834
+ - type: recall_at_3
835
+ value: 34.288000000000004
836
+ - type: recall_at_5
837
+ value: 38.117000000000004
838
+ - task:
839
+ type: Retrieval
840
+ dataset:
841
+ type: BeIR/cqadupstack
842
+ name: MTEB CQADupstackTexRetrieval
843
+ config: default
844
+ split: test
845
+ revision: None
846
+ metrics:
847
+ - type: map_at_1
848
+ value: 16.197
849
+ - type: map_at_10
850
+ value: 22.968
851
+ - type: map_at_100
852
+ value: 24.095
853
+ - type: map_at_1000
854
+ value: 24.217
855
+ - type: map_at_3
856
+ value: 20.771
857
+ - type: map_at_5
858
+ value: 21.995
859
+ - type: mrr_at_1
860
+ value: 19.511
861
+ - type: mrr_at_10
862
+ value: 26.55
863
+ - type: mrr_at_100
864
+ value: 27.500999999999998
865
+ - type: mrr_at_1000
866
+ value: 27.578999999999997
867
+ - type: mrr_at_3
868
+ value: 24.421
869
+ - type: mrr_at_5
870
+ value: 25.604
871
+ - type: ndcg_at_1
872
+ value: 19.511
873
+ - type: ndcg_at_10
874
+ value: 27.386
875
+ - type: ndcg_at_100
876
+ value: 32.828
877
+ - type: ndcg_at_1000
878
+ value: 35.739
879
+ - type: ndcg_at_3
880
+ value: 23.405
881
+ - type: ndcg_at_5
882
+ value: 25.255
883
+ - type: precision_at_1
884
+ value: 19.511
885
+ - type: precision_at_10
886
+ value: 5.017
887
+ - type: precision_at_100
888
+ value: 0.91
889
+ - type: precision_at_1000
890
+ value: 0.133
891
+ - type: precision_at_3
892
+ value: 11.023
893
+ - type: precision_at_5
894
+ value: 8.025
895
+ - type: recall_at_1
896
+ value: 16.197
897
+ - type: recall_at_10
898
+ value: 37.09
899
+ - type: recall_at_100
900
+ value: 61.778
901
+ - type: recall_at_1000
902
+ value: 82.56599999999999
903
+ - type: recall_at_3
904
+ value: 26.034000000000002
905
+ - type: recall_at_5
906
+ value: 30.762
907
+ - task:
908
+ type: Retrieval
909
+ dataset:
910
+ type: BeIR/cqadupstack
911
+ name: MTEB CQADupstackUnixRetrieval
912
+ config: default
913
+ split: test
914
+ revision: None
915
+ metrics:
916
+ - type: map_at_1
917
+ value: 25.41
918
+ - type: map_at_10
919
+ value: 33.655
920
+ - type: map_at_100
921
+ value: 34.892
922
+ - type: map_at_1000
923
+ value: 34.995
924
+ - type: map_at_3
925
+ value: 30.94
926
+ - type: map_at_5
927
+ value: 32.303
928
+ - type: mrr_at_1
929
+ value: 29.477999999999998
930
+ - type: mrr_at_10
931
+ value: 37.443
932
+ - type: mrr_at_100
933
+ value: 38.383
934
+ - type: mrr_at_1000
935
+ value: 38.440000000000005
936
+ - type: mrr_at_3
937
+ value: 34.949999999999996
938
+ - type: mrr_at_5
939
+ value: 36.228
940
+ - type: ndcg_at_1
941
+ value: 29.477999999999998
942
+ - type: ndcg_at_10
943
+ value: 38.769
944
+ - type: ndcg_at_100
945
+ value: 44.245000000000005
946
+ - type: ndcg_at_1000
947
+ value: 46.593
948
+ - type: ndcg_at_3
949
+ value: 33.623
950
+ - type: ndcg_at_5
951
+ value: 35.766
952
+ - type: precision_at_1
953
+ value: 29.477999999999998
954
+ - type: precision_at_10
955
+ value: 6.455
956
+ - type: precision_at_100
957
+ value: 1.032
958
+ - type: precision_at_1000
959
+ value: 0.135
960
+ - type: precision_at_3
961
+ value: 14.893999999999998
962
+ - type: precision_at_5
963
+ value: 10.485
964
+ - type: recall_at_1
965
+ value: 25.41
966
+ - type: recall_at_10
967
+ value: 50.669
968
+ - type: recall_at_100
969
+ value: 74.084
970
+ - type: recall_at_1000
971
+ value: 90.435
972
+ - type: recall_at_3
973
+ value: 36.679
974
+ - type: recall_at_5
975
+ value: 41.94
976
+ - task:
977
+ type: Retrieval
978
+ dataset:
979
+ type: BeIR/cqadupstack
980
+ name: MTEB CQADupstackWebmastersRetrieval
981
+ config: default
982
+ split: test
983
+ revision: None
984
+ metrics:
985
+ - type: map_at_1
986
+ value: 23.339
987
+ - type: map_at_10
988
+ value: 31.852000000000004
989
+ - type: map_at_100
990
+ value: 33.411
991
+ - type: map_at_1000
992
+ value: 33.62
993
+ - type: map_at_3
994
+ value: 28.929
995
+ - type: map_at_5
996
+ value: 30.542
997
+ - type: mrr_at_1
998
+ value: 28.063
999
+ - type: mrr_at_10
1000
+ value: 36.301
1001
+ - type: mrr_at_100
1002
+ value: 37.288
1003
+ - type: mrr_at_1000
1004
+ value: 37.349
1005
+ - type: mrr_at_3
1006
+ value: 33.663
1007
+ - type: mrr_at_5
1008
+ value: 35.165
1009
+ - type: ndcg_at_1
1010
+ value: 28.063
1011
+ - type: ndcg_at_10
1012
+ value: 37.462
1013
+ - type: ndcg_at_100
1014
+ value: 43.620999999999995
1015
+ - type: ndcg_at_1000
1016
+ value: 46.211
1017
+ - type: ndcg_at_3
1018
+ value: 32.68
1019
+ - type: ndcg_at_5
1020
+ value: 34.981
1021
+ - type: precision_at_1
1022
+ value: 28.063
1023
+ - type: precision_at_10
1024
+ value: 7.1739999999999995
1025
+ - type: precision_at_100
1026
+ value: 1.486
1027
+ - type: precision_at_1000
1028
+ value: 0.23500000000000001
1029
+ - type: precision_at_3
1030
+ value: 15.217
1031
+ - type: precision_at_5
1032
+ value: 11.265
1033
+ - type: recall_at_1
1034
+ value: 23.339
1035
+ - type: recall_at_10
1036
+ value: 48.376999999999995
1037
+ - type: recall_at_100
1038
+ value: 76.053
1039
+ - type: recall_at_1000
1040
+ value: 92.455
1041
+ - type: recall_at_3
1042
+ value: 34.735
1043
+ - type: recall_at_5
1044
+ value: 40.71
1045
+ - task:
1046
+ type: Retrieval
1047
+ dataset:
1048
+ type: BeIR/cqadupstack
1049
+ name: MTEB CQADupstackWordpressRetrieval
1050
+ config: default
1051
+ split: test
1052
+ revision: None
1053
+ metrics:
1054
+ - type: map_at_1
1055
+ value: 18.925
1056
+ - type: map_at_10
1057
+ value: 26.017000000000003
1058
+ - type: map_at_100
1059
+ value: 27.034000000000002
1060
+ - type: map_at_1000
1061
+ value: 27.156000000000002
1062
+ - type: map_at_3
1063
+ value: 23.604
1064
+ - type: map_at_5
1065
+ value: 24.75
1066
+ - type: mrr_at_1
1067
+ value: 20.333000000000002
1068
+ - type: mrr_at_10
1069
+ value: 27.915
1070
+ - type: mrr_at_100
1071
+ value: 28.788000000000004
1072
+ - type: mrr_at_1000
1073
+ value: 28.877999999999997
1074
+ - type: mrr_at_3
1075
+ value: 25.446999999999996
1076
+ - type: mrr_at_5
1077
+ value: 26.648
1078
+ - type: ndcg_at_1
1079
+ value: 20.333000000000002
1080
+ - type: ndcg_at_10
1081
+ value: 30.673000000000002
1082
+ - type: ndcg_at_100
1083
+ value: 35.618
1084
+ - type: ndcg_at_1000
1085
+ value: 38.517
1086
+ - type: ndcg_at_3
1087
+ value: 25.71
1088
+ - type: ndcg_at_5
1089
+ value: 27.679
1090
+ - type: precision_at_1
1091
+ value: 20.333000000000002
1092
+ - type: precision_at_10
1093
+ value: 4.9910000000000005
1094
+ - type: precision_at_100
1095
+ value: 0.8130000000000001
1096
+ - type: precision_at_1000
1097
+ value: 0.117
1098
+ - type: precision_at_3
1099
+ value: 11.029
1100
+ - type: precision_at_5
1101
+ value: 7.8740000000000006
1102
+ - type: recall_at_1
1103
+ value: 18.925
1104
+ - type: recall_at_10
1105
+ value: 43.311
1106
+ - type: recall_at_100
1107
+ value: 66.308
1108
+ - type: recall_at_1000
1109
+ value: 87.49
1110
+ - type: recall_at_3
1111
+ value: 29.596
1112
+ - type: recall_at_5
1113
+ value: 34.245
1114
+ - task:
1115
+ type: Retrieval
1116
+ dataset:
1117
+ type: climate-fever
1118
+ name: MTEB ClimateFEVER
1119
+ config: default
1120
+ split: test
1121
+ revision: None
1122
+ metrics:
1123
+ - type: map_at_1
1124
+ value: 13.714
1125
+ - type: map_at_10
1126
+ value: 23.194
1127
+ - type: map_at_100
1128
+ value: 24.976000000000003
1129
+ - type: map_at_1000
1130
+ value: 25.166
1131
+ - type: map_at_3
1132
+ value: 19.709
1133
+ - type: map_at_5
1134
+ value: 21.523999999999997
1135
+ - type: mrr_at_1
1136
+ value: 30.619000000000003
1137
+ - type: mrr_at_10
1138
+ value: 42.563
1139
+ - type: mrr_at_100
1140
+ value: 43.386
1141
+ - type: mrr_at_1000
1142
+ value: 43.423
1143
+ - type: mrr_at_3
1144
+ value: 39.555
1145
+ - type: mrr_at_5
1146
+ value: 41.268
1147
+ - type: ndcg_at_1
1148
+ value: 30.619000000000003
1149
+ - type: ndcg_at_10
1150
+ value: 31.836
1151
+ - type: ndcg_at_100
1152
+ value: 38.652
1153
+ - type: ndcg_at_1000
1154
+ value: 42.088
1155
+ - type: ndcg_at_3
1156
+ value: 26.733
1157
+ - type: ndcg_at_5
1158
+ value: 28.435
1159
+ - type: precision_at_1
1160
+ value: 30.619000000000003
1161
+ - type: precision_at_10
1162
+ value: 9.751999999999999
1163
+ - type: precision_at_100
1164
+ value: 1.71
1165
+ - type: precision_at_1000
1166
+ value: 0.23500000000000001
1167
+ - type: precision_at_3
1168
+ value: 19.935
1169
+ - type: precision_at_5
1170
+ value: 14.984
1171
+ - type: recall_at_1
1172
+ value: 13.714
1173
+ - type: recall_at_10
1174
+ value: 37.26
1175
+ - type: recall_at_100
1176
+ value: 60.546
1177
+ - type: recall_at_1000
1178
+ value: 79.899
1179
+ - type: recall_at_3
1180
+ value: 24.325
1181
+ - type: recall_at_5
1182
+ value: 29.725
1183
+ - task:
1184
+ type: Retrieval
1185
+ dataset:
1186
+ type: dbpedia-entity
1187
+ name: MTEB DBPedia
1188
+ config: default
1189
+ split: test
1190
+ revision: None
1191
+ metrics:
1192
+ - type: map_at_1
1193
+ value: 8.462
1194
+ - type: map_at_10
1195
+ value: 18.637
1196
+ - type: map_at_100
1197
+ value: 26.131999999999998
1198
+ - type: map_at_1000
1199
+ value: 27.607
1200
+ - type: map_at_3
1201
+ value: 13.333
1202
+ - type: map_at_5
1203
+ value: 15.654000000000002
1204
+ - type: mrr_at_1
1205
+ value: 66.25
1206
+ - type: mrr_at_10
1207
+ value: 74.32600000000001
1208
+ - type: mrr_at_100
1209
+ value: 74.60900000000001
1210
+ - type: mrr_at_1000
1211
+ value: 74.62
1212
+ - type: mrr_at_3
1213
+ value: 72.667
1214
+ - type: mrr_at_5
1215
+ value: 73.817
1216
+ - type: ndcg_at_1
1217
+ value: 53.87499999999999
1218
+ - type: ndcg_at_10
1219
+ value: 40.028999999999996
1220
+ - type: ndcg_at_100
1221
+ value: 44.199
1222
+ - type: ndcg_at_1000
1223
+ value: 51.629999999999995
1224
+ - type: ndcg_at_3
1225
+ value: 44.113
1226
+ - type: ndcg_at_5
1227
+ value: 41.731
1228
+ - type: precision_at_1
1229
+ value: 66.25
1230
+ - type: precision_at_10
1231
+ value: 31.900000000000002
1232
+ - type: precision_at_100
1233
+ value: 10.043000000000001
1234
+ - type: precision_at_1000
1235
+ value: 1.926
1236
+ - type: precision_at_3
1237
+ value: 47.417
1238
+ - type: precision_at_5
1239
+ value: 40.65
1240
+ - type: recall_at_1
1241
+ value: 8.462
1242
+ - type: recall_at_10
1243
+ value: 24.293
1244
+ - type: recall_at_100
1245
+ value: 50.146
1246
+ - type: recall_at_1000
1247
+ value: 74.034
1248
+ - type: recall_at_3
1249
+ value: 14.967
1250
+ - type: recall_at_5
1251
+ value: 18.682000000000002
1252
+ - task:
1253
+ type: Classification
1254
+ dataset:
1255
+ type: mteb/emotion
1256
+ name: MTEB EmotionClassification
1257
+ config: default
1258
+ split: test
1259
+ revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37
1260
+ metrics:
1261
+ - type: accuracy
1262
+ value: 47.84499999999999
1263
+ - type: f1
1264
+ value: 42.48106691979349
1265
+ - task:
1266
+ type: Retrieval
1267
+ dataset:
1268
+ type: fever
1269
+ name: MTEB FEVER
1270
+ config: default
1271
+ split: test
1272
+ revision: None
1273
+ metrics:
1274
+ - type: map_at_1
1275
+ value: 74.034
1276
+ - type: map_at_10
1277
+ value: 82.76
1278
+ - type: map_at_100
1279
+ value: 82.968
1280
+ - type: map_at_1000
1281
+ value: 82.98299999999999
1282
+ - type: map_at_3
1283
+ value: 81.768
1284
+ - type: map_at_5
1285
+ value: 82.418
1286
+ - type: mrr_at_1
1287
+ value: 80.048
1288
+ - type: mrr_at_10
1289
+ value: 87.64999999999999
1290
+ - type: mrr_at_100
1291
+ value: 87.712
1292
+ - type: mrr_at_1000
1293
+ value: 87.713
1294
+ - type: mrr_at_3
1295
+ value: 87.01100000000001
1296
+ - type: mrr_at_5
1297
+ value: 87.466
1298
+ - type: ndcg_at_1
1299
+ value: 80.048
1300
+ - type: ndcg_at_10
1301
+ value: 86.643
1302
+ - type: ndcg_at_100
1303
+ value: 87.361
1304
+ - type: ndcg_at_1000
1305
+ value: 87.606
1306
+ - type: ndcg_at_3
1307
+ value: 85.137
1308
+ - type: ndcg_at_5
1309
+ value: 86.016
1310
+ - type: precision_at_1
1311
+ value: 80.048
1312
+ - type: precision_at_10
1313
+ value: 10.372
1314
+ - type: precision_at_100
1315
+ value: 1.093
1316
+ - type: precision_at_1000
1317
+ value: 0.11299999999999999
1318
+ - type: precision_at_3
1319
+ value: 32.638
1320
+ - type: precision_at_5
1321
+ value: 20.177
1322
+ - type: recall_at_1
1323
+ value: 74.034
1324
+ - type: recall_at_10
1325
+ value: 93.769
1326
+ - type: recall_at_100
1327
+ value: 96.569
1328
+ - type: recall_at_1000
1329
+ value: 98.039
1330
+ - type: recall_at_3
1331
+ value: 89.581
1332
+ - type: recall_at_5
1333
+ value: 91.906
1334
+ - task:
1335
+ type: Retrieval
1336
+ dataset:
1337
+ type: fiqa
1338
+ name: MTEB FiQA2018
1339
+ config: default
1340
+ split: test
1341
+ revision: None
1342
+ metrics:
1343
+ - type: map_at_1
1344
+ value: 20.5
1345
+ - type: map_at_10
1346
+ value: 32.857
1347
+ - type: map_at_100
1348
+ value: 34.589
1349
+ - type: map_at_1000
1350
+ value: 34.778
1351
+ - type: map_at_3
1352
+ value: 29.160999999999998
1353
+ - type: map_at_5
1354
+ value: 31.033
1355
+ - type: mrr_at_1
1356
+ value: 40.123
1357
+ - type: mrr_at_10
1358
+ value: 48.776
1359
+ - type: mrr_at_100
1360
+ value: 49.495
1361
+ - type: mrr_at_1000
1362
+ value: 49.539
1363
+ - type: mrr_at_3
1364
+ value: 46.605000000000004
1365
+ - type: mrr_at_5
1366
+ value: 47.654
1367
+ - type: ndcg_at_1
1368
+ value: 40.123
1369
+ - type: ndcg_at_10
1370
+ value: 40.343
1371
+ - type: ndcg_at_100
1372
+ value: 46.56
1373
+ - type: ndcg_at_1000
1374
+ value: 49.777
1375
+ - type: ndcg_at_3
1376
+ value: 37.322
1377
+ - type: ndcg_at_5
1378
+ value: 37.791000000000004
1379
+ - type: precision_at_1
1380
+ value: 40.123
1381
+ - type: precision_at_10
1382
+ value: 11.08
1383
+ - type: precision_at_100
1384
+ value: 1.752
1385
+ - type: precision_at_1000
1386
+ value: 0.232
1387
+ - type: precision_at_3
1388
+ value: 24.897
1389
+ - type: precision_at_5
1390
+ value: 17.809
1391
+ - type: recall_at_1
1392
+ value: 20.5
1393
+ - type: recall_at_10
1394
+ value: 46.388
1395
+ - type: recall_at_100
1396
+ value: 69.552
1397
+ - type: recall_at_1000
1398
+ value: 89.011
1399
+ - type: recall_at_3
1400
+ value: 33.617999999999995
1401
+ - type: recall_at_5
1402
+ value: 38.211
1403
+ - task:
1404
+ type: Retrieval
1405
+ dataset:
1406
+ type: hotpotqa
1407
+ name: MTEB HotpotQA
1408
+ config: default
1409
+ split: test
1410
+ revision: None
1411
+ metrics:
1412
+ - type: map_at_1
1413
+ value: 39.135999999999996
1414
+ - type: map_at_10
1415
+ value: 61.673
1416
+ - type: map_at_100
1417
+ value: 62.562
1418
+ - type: map_at_1000
1419
+ value: 62.62
1420
+ - type: map_at_3
1421
+ value: 58.467999999999996
1422
+ - type: map_at_5
1423
+ value: 60.463
1424
+ - type: mrr_at_1
1425
+ value: 78.271
1426
+ - type: mrr_at_10
1427
+ value: 84.119
1428
+ - type: mrr_at_100
1429
+ value: 84.29299999999999
1430
+ - type: mrr_at_1000
1431
+ value: 84.299
1432
+ - type: mrr_at_3
1433
+ value: 83.18900000000001
1434
+ - type: mrr_at_5
1435
+ value: 83.786
1436
+ - type: ndcg_at_1
1437
+ value: 78.271
1438
+ - type: ndcg_at_10
1439
+ value: 69.935
1440
+ - type: ndcg_at_100
1441
+ value: 73.01299999999999
1442
+ - type: ndcg_at_1000
1443
+ value: 74.126
1444
+ - type: ndcg_at_3
1445
+ value: 65.388
1446
+ - type: ndcg_at_5
1447
+ value: 67.906
1448
+ - type: precision_at_1
1449
+ value: 78.271
1450
+ - type: precision_at_10
1451
+ value: 14.562
1452
+ - type: precision_at_100
1453
+ value: 1.6969999999999998
1454
+ - type: precision_at_1000
1455
+ value: 0.184
1456
+ - type: precision_at_3
1457
+ value: 41.841
1458
+ - type: precision_at_5
1459
+ value: 27.087
1460
+ - type: recall_at_1
1461
+ value: 39.135999999999996
1462
+ - type: recall_at_10
1463
+ value: 72.809
1464
+ - type: recall_at_100
1465
+ value: 84.86200000000001
1466
+ - type: recall_at_1000
1467
+ value: 92.208
1468
+ - type: recall_at_3
1469
+ value: 62.76199999999999
1470
+ - type: recall_at_5
1471
+ value: 67.718
1472
+ - task:
1473
+ type: Classification
1474
+ dataset:
1475
+ type: mteb/imdb
1476
+ name: MTEB ImdbClassification
1477
+ config: default
1478
+ split: test
1479
+ revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7
1480
+ metrics:
1481
+ - type: accuracy
1482
+ value: 90.60600000000001
1483
+ - type: ap
1484
+ value: 86.6579587804335
1485
+ - type: f1
1486
+ value: 90.5938853929307
1487
+ - task:
1488
+ type: Retrieval
1489
+ dataset:
1490
+ type: msmarco
1491
+ name: MTEB MSMARCO
1492
+ config: default
1493
+ split: dev
1494
+ revision: None
1495
+ metrics:
1496
+ - type: map_at_1
1497
+ value: 21.852
1498
+ - type: map_at_10
1499
+ value: 33.982
1500
+ - type: map_at_100
1501
+ value: 35.116
1502
+ - type: map_at_1000
1503
+ value: 35.167
1504
+ - type: map_at_3
1505
+ value: 30.134
1506
+ - type: map_at_5
1507
+ value: 32.340999999999994
1508
+ - type: mrr_at_1
1509
+ value: 22.479
1510
+ - type: mrr_at_10
1511
+ value: 34.594
1512
+ - type: mrr_at_100
1513
+ value: 35.672
1514
+ - type: mrr_at_1000
1515
+ value: 35.716
1516
+ - type: mrr_at_3
1517
+ value: 30.84
1518
+ - type: mrr_at_5
1519
+ value: 32.998
1520
+ - type: ndcg_at_1
1521
+ value: 22.493
1522
+ - type: ndcg_at_10
1523
+ value: 40.833000000000006
1524
+ - type: ndcg_at_100
1525
+ value: 46.357
1526
+ - type: ndcg_at_1000
1527
+ value: 47.637
1528
+ - type: ndcg_at_3
1529
+ value: 32.995999999999995
1530
+ - type: ndcg_at_5
1531
+ value: 36.919000000000004
1532
+ - type: precision_at_1
1533
+ value: 22.493
1534
+ - type: precision_at_10
1535
+ value: 6.465999999999999
1536
+ - type: precision_at_100
1537
+ value: 0.9249999999999999
1538
+ - type: precision_at_1000
1539
+ value: 0.104
1540
+ - type: precision_at_3
1541
+ value: 14.030999999999999
1542
+ - type: precision_at_5
1543
+ value: 10.413
1544
+ - type: recall_at_1
1545
+ value: 21.852
1546
+ - type: recall_at_10
1547
+ value: 61.934999999999995
1548
+ - type: recall_at_100
1549
+ value: 87.611
1550
+ - type: recall_at_1000
1551
+ value: 97.441
1552
+ - type: recall_at_3
1553
+ value: 40.583999999999996
1554
+ - type: recall_at_5
1555
+ value: 49.992999999999995
1556
+ - task:
1557
+ type: Classification
1558
+ dataset:
1559
+ type: mteb/mtop_domain
1560
+ name: MTEB MTOPDomainClassification (en)
1561
+ config: en
1562
+ split: test
1563
+ revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
1564
+ metrics:
1565
+ - type: accuracy
1566
+ value: 93.36069311445507
1567
+ - type: f1
1568
+ value: 93.16456330371453
1569
+ - task:
1570
+ type: Classification
1571
+ dataset:
1572
+ type: mteb/mtop_intent
1573
+ name: MTEB MTOPIntentClassification (en)
1574
+ config: en
1575
+ split: test
1576
+ revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
1577
+ metrics:
1578
+ - type: accuracy
1579
+ value: 74.74692202462381
1580
+ - type: f1
1581
+ value: 58.17903579421599
1582
+ - task:
1583
+ type: Classification
1584
+ dataset:
1585
+ type: mteb/amazon_massive_intent
1586
+ name: MTEB MassiveIntentClassification (en)
1587
+ config: en
1588
+ split: test
1589
+ revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1590
+ metrics:
1591
+ - type: accuracy
1592
+ value: 74.80833893745796
1593
+ - type: f1
1594
+ value: 72.70786592684664
1595
+ - task:
1596
+ type: Classification
1597
+ dataset:
1598
+ type: mteb/amazon_massive_scenario
1599
+ name: MTEB MassiveScenarioClassification (en)
1600
+ config: en
1601
+ split: test
1602
+ revision: 7d571f92784cd94a019292a1f45445077d0ef634
1603
+ metrics:
1604
+ - type: accuracy
1605
+ value: 78.69872225958305
1606
+ - type: f1
1607
+ value: 78.61626934504731
1608
+ - task:
1609
+ type: Clustering
1610
+ dataset:
1611
+ type: mteb/medrxiv-clustering-p2p
1612
+ name: MTEB MedrxivClusteringP2P
1613
+ config: default
1614
+ split: test
1615
+ revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73
1616
+ metrics:
1617
+ - type: v_measure
1618
+ value: 33.058658628717694
1619
+ - task:
1620
+ type: Clustering
1621
+ dataset:
1622
+ type: mteb/medrxiv-clustering-s2s
1623
+ name: MTEB MedrxivClusteringS2S
1624
+ config: default
1625
+ split: test
1626
+ revision: 35191c8c0dca72d8ff3efcd72aa802307d469663
1627
+ metrics:
1628
+ - type: v_measure
1629
+ value: 30.85561739360599
1630
+ - task:
1631
+ type: Reranking
1632
+ dataset:
1633
+ type: mteb/mind_small
1634
+ name: MTEB MindSmallReranking
1635
+ config: default
1636
+ split: test
1637
+ revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69
1638
+ metrics:
1639
+ - type: map
1640
+ value: 31.290259910144385
1641
+ - type: mrr
1642
+ value: 32.44223046102856
1643
+ - task:
1644
+ type: Retrieval
1645
+ dataset:
1646
+ type: nfcorpus
1647
+ name: MTEB NFCorpus
1648
+ config: default
1649
+ split: test
1650
+ revision: None
1651
+ metrics:
1652
+ - type: map_at_1
1653
+ value: 5.288
1654
+ - type: map_at_10
1655
+ value: 12.267999999999999
1656
+ - type: map_at_100
1657
+ value: 15.557000000000002
1658
+ - type: map_at_1000
1659
+ value: 16.98
1660
+ - type: map_at_3
1661
+ value: 8.866
1662
+ - type: map_at_5
1663
+ value: 10.418
1664
+ - type: mrr_at_1
1665
+ value: 43.653
1666
+ - type: mrr_at_10
1667
+ value: 52.681
1668
+ - type: mrr_at_100
1669
+ value: 53.315999999999995
1670
+ - type: mrr_at_1000
1671
+ value: 53.357
1672
+ - type: mrr_at_3
1673
+ value: 51.393
1674
+ - type: mrr_at_5
1675
+ value: 51.903999999999996
1676
+ - type: ndcg_at_1
1677
+ value: 42.415000000000006
1678
+ - type: ndcg_at_10
1679
+ value: 34.305
1680
+ - type: ndcg_at_100
1681
+ value: 30.825999999999997
1682
+ - type: ndcg_at_1000
1683
+ value: 39.393
1684
+ - type: ndcg_at_3
1685
+ value: 39.931
1686
+ - type: ndcg_at_5
1687
+ value: 37.519999999999996
1688
+ - type: precision_at_1
1689
+ value: 43.653
1690
+ - type: precision_at_10
1691
+ value: 25.728
1692
+ - type: precision_at_100
1693
+ value: 7.932
1694
+ - type: precision_at_1000
1695
+ value: 2.07
1696
+ - type: precision_at_3
1697
+ value: 38.184000000000005
1698
+ - type: precision_at_5
1699
+ value: 32.879000000000005
1700
+ - type: recall_at_1
1701
+ value: 5.288
1702
+ - type: recall_at_10
1703
+ value: 16.195
1704
+ - type: recall_at_100
1705
+ value: 31.135
1706
+ - type: recall_at_1000
1707
+ value: 61.531000000000006
1708
+ - type: recall_at_3
1709
+ value: 10.313
1710
+ - type: recall_at_5
1711
+ value: 12.754999999999999
1712
+ - task:
1713
+ type: Retrieval
1714
+ dataset:
1715
+ type: nq
1716
+ name: MTEB NQ
1717
+ config: default
1718
+ split: test
1719
+ revision: None
1720
+ metrics:
1721
+ - type: map_at_1
1722
+ value: 28.216
1723
+ - type: map_at_10
1724
+ value: 42.588
1725
+ - type: map_at_100
1726
+ value: 43.702999999999996
1727
+ - type: map_at_1000
1728
+ value: 43.739
1729
+ - type: map_at_3
1730
+ value: 38.177
1731
+ - type: map_at_5
1732
+ value: 40.754000000000005
1733
+ - type: mrr_at_1
1734
+ value: 31.866
1735
+ - type: mrr_at_10
1736
+ value: 45.189
1737
+ - type: mrr_at_100
1738
+ value: 46.056000000000004
1739
+ - type: mrr_at_1000
1740
+ value: 46.081
1741
+ - type: mrr_at_3
1742
+ value: 41.526999999999994
1743
+ - type: mrr_at_5
1744
+ value: 43.704
1745
+ - type: ndcg_at_1
1746
+ value: 31.837
1747
+ - type: ndcg_at_10
1748
+ value: 50.178
1749
+ - type: ndcg_at_100
1750
+ value: 54.98800000000001
1751
+ - type: ndcg_at_1000
1752
+ value: 55.812
1753
+ - type: ndcg_at_3
1754
+ value: 41.853
1755
+ - type: ndcg_at_5
1756
+ value: 46.153
1757
+ - type: precision_at_1
1758
+ value: 31.837
1759
+ - type: precision_at_10
1760
+ value: 8.43
1761
+ - type: precision_at_100
1762
+ value: 1.1119999999999999
1763
+ - type: precision_at_1000
1764
+ value: 0.11900000000000001
1765
+ - type: precision_at_3
1766
+ value: 19.023
1767
+ - type: precision_at_5
1768
+ value: 13.911000000000001
1769
+ - type: recall_at_1
1770
+ value: 28.216
1771
+ - type: recall_at_10
1772
+ value: 70.8
1773
+ - type: recall_at_100
1774
+ value: 91.857
1775
+ - type: recall_at_1000
1776
+ value: 97.941
1777
+ - type: recall_at_3
1778
+ value: 49.196
1779
+ - type: recall_at_5
1780
+ value: 59.072
1781
+ - task:
1782
+ type: Retrieval
1783
+ dataset:
1784
+ type: quora
1785
+ name: MTEB QuoraRetrieval
1786
+ config: default
1787
+ split: test
1788
+ revision: None
1789
+ metrics:
1790
+ - type: map_at_1
1791
+ value: 71.22800000000001
1792
+ - type: map_at_10
1793
+ value: 85.115
1794
+ - type: map_at_100
1795
+ value: 85.72
1796
+ - type: map_at_1000
1797
+ value: 85.737
1798
+ - type: map_at_3
1799
+ value: 82.149
1800
+ - type: map_at_5
1801
+ value: 84.029
1802
+ - type: mrr_at_1
1803
+ value: 81.96
1804
+ - type: mrr_at_10
1805
+ value: 88.00200000000001
1806
+ - type: mrr_at_100
1807
+ value: 88.088
1808
+ - type: mrr_at_1000
1809
+ value: 88.089
1810
+ - type: mrr_at_3
1811
+ value: 87.055
1812
+ - type: mrr_at_5
1813
+ value: 87.715
1814
+ - type: ndcg_at_1
1815
+ value: 82.01
1816
+ - type: ndcg_at_10
1817
+ value: 88.78
1818
+ - type: ndcg_at_100
1819
+ value: 89.91
1820
+ - type: ndcg_at_1000
1821
+ value: 90.013
1822
+ - type: ndcg_at_3
1823
+ value: 85.957
1824
+ - type: ndcg_at_5
1825
+ value: 87.56
1826
+ - type: precision_at_1
1827
+ value: 82.01
1828
+ - type: precision_at_10
1829
+ value: 13.462
1830
+ - type: precision_at_100
1831
+ value: 1.528
1832
+ - type: precision_at_1000
1833
+ value: 0.157
1834
+ - type: precision_at_3
1835
+ value: 37.553
1836
+ - type: precision_at_5
1837
+ value: 24.732000000000003
1838
+ - type: recall_at_1
1839
+ value: 71.22800000000001
1840
+ - type: recall_at_10
1841
+ value: 95.69
1842
+ - type: recall_at_100
1843
+ value: 99.531
1844
+ - type: recall_at_1000
1845
+ value: 99.98
1846
+ - type: recall_at_3
1847
+ value: 87.632
1848
+ - type: recall_at_5
1849
+ value: 92.117
1850
+ - task:
1851
+ type: Clustering
1852
+ dataset:
1853
+ type: mteb/reddit-clustering
1854
+ name: MTEB RedditClustering
1855
+ config: default
1856
+ split: test
1857
+ revision: 24640382cdbf8abc73003fb0fa6d111a705499eb
1858
+ metrics:
1859
+ - type: v_measure
1860
+ value: 52.31768034366916
1861
+ - task:
1862
+ type: Clustering
1863
+ dataset:
1864
+ type: mteb/reddit-clustering-p2p
1865
+ name: MTEB RedditClusteringP2P
1866
+ config: default
1867
+ split: test
1868
+ revision: 282350215ef01743dc01b456c7f5241fa8937f16
1869
+ metrics:
1870
+ - type: v_measure
1871
+ value: 60.640266772723606
1872
+ - task:
1873
+ type: Retrieval
1874
+ dataset:
1875
+ type: scidocs
1876
+ name: MTEB SCIDOCS
1877
+ config: default
1878
+ split: test
1879
+ revision: None
1880
+ metrics:
1881
+ - type: map_at_1
1882
+ value: 4.7780000000000005
1883
+ - type: map_at_10
1884
+ value: 12.299
1885
+ - type: map_at_100
1886
+ value: 14.363000000000001
1887
+ - type: map_at_1000
1888
+ value: 14.71
1889
+ - type: map_at_3
1890
+ value: 8.738999999999999
1891
+ - type: map_at_5
1892
+ value: 10.397
1893
+ - type: mrr_at_1
1894
+ value: 23.599999999999998
1895
+ - type: mrr_at_10
1896
+ value: 34.845
1897
+ - type: mrr_at_100
1898
+ value: 35.916
1899
+ - type: mrr_at_1000
1900
+ value: 35.973
1901
+ - type: mrr_at_3
1902
+ value: 31.7
1903
+ - type: mrr_at_5
1904
+ value: 33.535
1905
+ - type: ndcg_at_1
1906
+ value: 23.599999999999998
1907
+ - type: ndcg_at_10
1908
+ value: 20.522000000000002
1909
+ - type: ndcg_at_100
1910
+ value: 28.737000000000002
1911
+ - type: ndcg_at_1000
1912
+ value: 34.596
1913
+ - type: ndcg_at_3
1914
+ value: 19.542
1915
+ - type: ndcg_at_5
1916
+ value: 16.958000000000002
1917
+ - type: precision_at_1
1918
+ value: 23.599999999999998
1919
+ - type: precision_at_10
1920
+ value: 10.67
1921
+ - type: precision_at_100
1922
+ value: 2.259
1923
+ - type: precision_at_1000
1924
+ value: 0.367
1925
+ - type: precision_at_3
1926
+ value: 18.333
1927
+ - type: precision_at_5
1928
+ value: 14.879999999999999
1929
+ - type: recall_at_1
1930
+ value: 4.7780000000000005
1931
+ - type: recall_at_10
1932
+ value: 21.617
1933
+ - type: recall_at_100
1934
+ value: 45.905
1935
+ - type: recall_at_1000
1936
+ value: 74.42
1937
+ - type: recall_at_3
1938
+ value: 11.148
1939
+ - type: recall_at_5
1940
+ value: 15.082999999999998
1941
+ - task:
1942
+ type: STS
1943
+ dataset:
1944
+ type: mteb/sickr-sts
1945
+ name: MTEB SICK-R
1946
+ config: default
1947
+ split: test
1948
+ revision: a6ea5a8cab320b040a23452cc28066d9beae2cee
1949
+ metrics:
1950
+ - type: cos_sim_pearson
1951
+ value: 83.22372750297885
1952
+ - type: cos_sim_spearman
1953
+ value: 79.40972617119405
1954
+ - type: euclidean_pearson
1955
+ value: 80.6101072020434
1956
+ - type: euclidean_spearman
1957
+ value: 79.53844217225202
1958
+ - type: manhattan_pearson
1959
+ value: 80.57265975286111
1960
+ - type: manhattan_spearman
1961
+ value: 79.46335611792958
1962
+ - task:
1963
+ type: STS
1964
+ dataset:
1965
+ type: mteb/sts12-sts
1966
+ name: MTEB STS12
1967
+ config: default
1968
+ split: test
1969
+ revision: a0d554a64d88156834ff5ae9920b964011b16384
1970
+ metrics:
1971
+ - type: cos_sim_pearson
1972
+ value: 85.43713315520749
1973
+ - type: cos_sim_spearman
1974
+ value: 77.44128693329532
1975
+ - type: euclidean_pearson
1976
+ value: 81.63869928101123
1977
+ - type: euclidean_spearman
1978
+ value: 77.29512977961515
1979
+ - type: manhattan_pearson
1980
+ value: 81.63704185566183
1981
+ - type: manhattan_spearman
1982
+ value: 77.29909412738657
1983
+ - task:
1984
+ type: STS
1985
+ dataset:
1986
+ type: mteb/sts13-sts
1987
+ name: MTEB STS13
1988
+ config: default
1989
+ split: test
1990
+ revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca
1991
+ metrics:
1992
+ - type: cos_sim_pearson
1993
+ value: 81.59451537860527
1994
+ - type: cos_sim_spearman
1995
+ value: 82.97994638856723
1996
+ - type: euclidean_pearson
1997
+ value: 82.89478688288412
1998
+ - type: euclidean_spearman
1999
+ value: 83.58740751053104
2000
+ - type: manhattan_pearson
2001
+ value: 82.69140840941608
2002
+ - type: manhattan_spearman
2003
+ value: 83.33665956040555
2004
+ - task:
2005
+ type: STS
2006
+ dataset:
2007
+ type: mteb/sts14-sts
2008
+ name: MTEB STS14
2009
+ config: default
2010
+ split: test
2011
+ revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375
2012
+ metrics:
2013
+ - type: cos_sim_pearson
2014
+ value: 82.00756527711764
2015
+ - type: cos_sim_spearman
2016
+ value: 81.83560996841379
2017
+ - type: euclidean_pearson
2018
+ value: 82.07684151976518
2019
+ - type: euclidean_spearman
2020
+ value: 82.00913052060511
2021
+ - type: manhattan_pearson
2022
+ value: 82.05690778488794
2023
+ - type: manhattan_spearman
2024
+ value: 82.02260252019525
2025
+ - task:
2026
+ type: STS
2027
+ dataset:
2028
+ type: mteb/sts15-sts
2029
+ name: MTEB STS15
2030
+ config: default
2031
+ split: test
2032
+ revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3
2033
+ metrics:
2034
+ - type: cos_sim_pearson
2035
+ value: 86.13710262895447
2036
+ - type: cos_sim_spearman
2037
+ value: 87.26412811156248
2038
+ - type: euclidean_pearson
2039
+ value: 86.94151453230228
2040
+ - type: euclidean_spearman
2041
+ value: 87.5363796699571
2042
+ - type: manhattan_pearson
2043
+ value: 86.86989424083748
2044
+ - type: manhattan_spearman
2045
+ value: 87.47315940781353
2046
+ - task:
2047
+ type: STS
2048
+ dataset:
2049
+ type: mteb/sts16-sts
2050
+ name: MTEB STS16
2051
+ config: default
2052
+ split: test
2053
+ revision: 4d8694f8f0e0100860b497b999b3dbed754a0513
2054
+ metrics:
2055
+ - type: cos_sim_pearson
2056
+ value: 83.0230597603627
2057
+ - type: cos_sim_spearman
2058
+ value: 84.93344499318864
2059
+ - type: euclidean_pearson
2060
+ value: 84.23754743431141
2061
+ - type: euclidean_spearman
2062
+ value: 85.09707376597099
2063
+ - type: manhattan_pearson
2064
+ value: 84.04325160987763
2065
+ - type: manhattan_spearman
2066
+ value: 84.89353071339909
2067
+ - task:
2068
+ type: STS
2069
+ dataset:
2070
+ type: mteb/sts17-crosslingual-sts
2071
+ name: MTEB STS17 (en-en)
2072
+ config: en-en
2073
+ split: test
2074
+ revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d
2075
+ metrics:
2076
+ - type: cos_sim_pearson
2077
+ value: 86.75620824563921
2078
+ - type: cos_sim_spearman
2079
+ value: 87.15065513706398
2080
+ - type: euclidean_pearson
2081
+ value: 88.26281533633521
2082
+ - type: euclidean_spearman
2083
+ value: 87.51963738643983
2084
+ - type: manhattan_pearson
2085
+ value: 88.25599267618065
2086
+ - type: manhattan_spearman
2087
+ value: 87.58048736047483
2088
+ - task:
2089
+ type: STS
2090
+ dataset:
2091
+ type: mteb/sts22-crosslingual-sts
2092
+ name: MTEB STS22 (en)
2093
+ config: en
2094
+ split: test
2095
+ revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80
2096
+ metrics:
2097
+ - type: cos_sim_pearson
2098
+ value: 64.74645319195137
2099
+ - type: cos_sim_spearman
2100
+ value: 65.29996325037214
2101
+ - type: euclidean_pearson
2102
+ value: 67.04297794086443
2103
+ - type: euclidean_spearman
2104
+ value: 65.43841726694343
2105
+ - type: manhattan_pearson
2106
+ value: 67.39459955690904
2107
+ - type: manhattan_spearman
2108
+ value: 65.92864704413651
2109
+ - task:
2110
+ type: STS
2111
+ dataset:
2112
+ type: mteb/stsbenchmark-sts
2113
+ name: MTEB STSBenchmark
2114
+ config: default
2115
+ split: test
2116
+ revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831
2117
+ metrics:
2118
+ - type: cos_sim_pearson
2119
+ value: 84.31291020270801
2120
+ - type: cos_sim_spearman
2121
+ value: 85.86473738688068
2122
+ - type: euclidean_pearson
2123
+ value: 85.65537275064152
2124
+ - type: euclidean_spearman
2125
+ value: 86.13087454209642
2126
+ - type: manhattan_pearson
2127
+ value: 85.43946955047609
2128
+ - type: manhattan_spearman
2129
+ value: 85.91568175344916
2130
+ - task:
2131
+ type: Reranking
2132
+ dataset:
2133
+ type: mteb/scidocs-reranking
2134
+ name: MTEB SciDocsRR
2135
+ config: default
2136
+ split: test
2137
+ revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab
2138
+ metrics:
2139
+ - type: map
2140
+ value: 85.93798118350695
2141
+ - type: mrr
2142
+ value: 95.93536274908824
2143
+ - task:
2144
+ type: Retrieval
2145
+ dataset:
2146
+ type: scifact
2147
+ name: MTEB SciFact
2148
+ config: default
2149
+ split: test
2150
+ revision: None
2151
+ metrics:
2152
+ - type: map_at_1
2153
+ value: 57.594
2154
+ - type: map_at_10
2155
+ value: 66.81899999999999
2156
+ - type: map_at_100
2157
+ value: 67.368
2158
+ - type: map_at_1000
2159
+ value: 67.4
2160
+ - type: map_at_3
2161
+ value: 64.061
2162
+ - type: map_at_5
2163
+ value: 65.47
2164
+ - type: mrr_at_1
2165
+ value: 60.667
2166
+ - type: mrr_at_10
2167
+ value: 68.219
2168
+ - type: mrr_at_100
2169
+ value: 68.655
2170
+ - type: mrr_at_1000
2171
+ value: 68.684
2172
+ - type: mrr_at_3
2173
+ value: 66.22200000000001
2174
+ - type: mrr_at_5
2175
+ value: 67.289
2176
+ - type: ndcg_at_1
2177
+ value: 60.667
2178
+ - type: ndcg_at_10
2179
+ value: 71.275
2180
+ - type: ndcg_at_100
2181
+ value: 73.642
2182
+ - type: ndcg_at_1000
2183
+ value: 74.373
2184
+ - type: ndcg_at_3
2185
+ value: 66.521
2186
+ - type: ndcg_at_5
2187
+ value: 68.581
2188
+ - type: precision_at_1
2189
+ value: 60.667
2190
+ - type: precision_at_10
2191
+ value: 9.433
2192
+ - type: precision_at_100
2193
+ value: 1.0699999999999998
2194
+ - type: precision_at_1000
2195
+ value: 0.11299999999999999
2196
+ - type: precision_at_3
2197
+ value: 25.556
2198
+ - type: precision_at_5
2199
+ value: 16.8
2200
+ - type: recall_at_1
2201
+ value: 57.594
2202
+ - type: recall_at_10
2203
+ value: 83.622
2204
+ - type: recall_at_100
2205
+ value: 94.167
2206
+ - type: recall_at_1000
2207
+ value: 99.667
2208
+ - type: recall_at_3
2209
+ value: 70.64399999999999
2210
+ - type: recall_at_5
2211
+ value: 75.983
2212
+ - task:
2213
+ type: PairClassification
2214
+ dataset:
2215
+ type: mteb/sprintduplicatequestions-pairclassification
2216
+ name: MTEB SprintDuplicateQuestions
2217
+ config: default
2218
+ split: test
2219
+ revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46
2220
+ metrics:
2221
+ - type: cos_sim_accuracy
2222
+ value: 99.85841584158416
2223
+ - type: cos_sim_ap
2224
+ value: 96.66996142314342
2225
+ - type: cos_sim_f1
2226
+ value: 92.83208020050125
2227
+ - type: cos_sim_precision
2228
+ value: 93.06532663316584
2229
+ - type: cos_sim_recall
2230
+ value: 92.60000000000001
2231
+ - type: dot_accuracy
2232
+ value: 99.85841584158416
2233
+ - type: dot_ap
2234
+ value: 96.6775307676576
2235
+ - type: dot_f1
2236
+ value: 92.69289729177312
2237
+ - type: dot_precision
2238
+ value: 94.77533960292581
2239
+ - type: dot_recall
2240
+ value: 90.7
2241
+ - type: euclidean_accuracy
2242
+ value: 99.86138613861387
2243
+ - type: euclidean_ap
2244
+ value: 96.6338454403108
2245
+ - type: euclidean_f1
2246
+ value: 92.92214357937311
2247
+ - type: euclidean_precision
2248
+ value: 93.96728016359918
2249
+ - type: euclidean_recall
2250
+ value: 91.9
2251
+ - type: manhattan_accuracy
2252
+ value: 99.86237623762376
2253
+ - type: manhattan_ap
2254
+ value: 96.60370449645053
2255
+ - type: manhattan_f1
2256
+ value: 92.91177970423253
2257
+ - type: manhattan_precision
2258
+ value: 94.7970863683663
2259
+ - type: manhattan_recall
2260
+ value: 91.10000000000001
2261
+ - type: max_accuracy
2262
+ value: 99.86237623762376
2263
+ - type: max_ap
2264
+ value: 96.6775307676576
2265
+ - type: max_f1
2266
+ value: 92.92214357937311
2267
+ - task:
2268
+ type: Clustering
2269
+ dataset:
2270
+ type: mteb/stackexchange-clustering
2271
+ name: MTEB StackExchangeClustering
2272
+ config: default
2273
+ split: test
2274
+ revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259
2275
+ metrics:
2276
+ - type: v_measure
2277
+ value: 60.77977058695198
2278
+ - task:
2279
+ type: Clustering
2280
+ dataset:
2281
+ type: mteb/stackexchange-clustering-p2p
2282
+ name: MTEB StackExchangeClusteringP2P
2283
+ config: default
2284
+ split: test
2285
+ revision: 815ca46b2622cec33ccafc3735d572c266efdb44
2286
+ metrics:
2287
+ - type: v_measure
2288
+ value: 35.2725272535638
2289
+ - task:
2290
+ type: Reranking
2291
+ dataset:
2292
+ type: mteb/stackoverflowdupquestions-reranking
2293
+ name: MTEB StackOverflowDupQuestions
2294
+ config: default
2295
+ split: test
2296
+ revision: e185fbe320c72810689fc5848eb6114e1ef5ec69
2297
+ metrics:
2298
+ - type: map
2299
+ value: 53.64052466362125
2300
+ - type: mrr
2301
+ value: 54.533067014684654
2302
+ - task:
2303
+ type: Summarization
2304
+ dataset:
2305
+ type: mteb/summeval
2306
+ name: MTEB SummEval
2307
+ config: default
2308
+ split: test
2309
+ revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c
2310
+ metrics:
2311
+ - type: cos_sim_pearson
2312
+ value: 30.677624219206578
2313
+ - type: cos_sim_spearman
2314
+ value: 30.121368518123447
2315
+ - type: dot_pearson
2316
+ value: 30.69870088041608
2317
+ - type: dot_spearman
2318
+ value: 29.61284927093751
2319
+ - task:
2320
+ type: Retrieval
2321
+ dataset:
2322
+ type: trec-covid
2323
+ name: MTEB TRECCOVID
2324
+ config: default
2325
+ split: test
2326
+ revision: None
2327
+ metrics:
2328
+ - type: map_at_1
2329
+ value: 0.22
2330
+ - type: map_at_10
2331
+ value: 1.855
2332
+ - type: map_at_100
2333
+ value: 9.885
2334
+ - type: map_at_1000
2335
+ value: 23.416999999999998
2336
+ - type: map_at_3
2337
+ value: 0.637
2338
+ - type: map_at_5
2339
+ value: 1.024
2340
+ - type: mrr_at_1
2341
+ value: 88.0
2342
+ - type: mrr_at_10
2343
+ value: 93.067
2344
+ - type: mrr_at_100
2345
+ value: 93.067
2346
+ - type: mrr_at_1000
2347
+ value: 93.067
2348
+ - type: mrr_at_3
2349
+ value: 92.667
2350
+ - type: mrr_at_5
2351
+ value: 93.067
2352
+ - type: ndcg_at_1
2353
+ value: 82.0
2354
+ - type: ndcg_at_10
2355
+ value: 75.899
2356
+ - type: ndcg_at_100
2357
+ value: 55.115
2358
+ - type: ndcg_at_1000
2359
+ value: 48.368
2360
+ - type: ndcg_at_3
2361
+ value: 79.704
2362
+ - type: ndcg_at_5
2363
+ value: 78.39699999999999
2364
+ - type: precision_at_1
2365
+ value: 88.0
2366
+ - type: precision_at_10
2367
+ value: 79.60000000000001
2368
+ - type: precision_at_100
2369
+ value: 56.06
2370
+ - type: precision_at_1000
2371
+ value: 21.206
2372
+ - type: precision_at_3
2373
+ value: 84.667
2374
+ - type: precision_at_5
2375
+ value: 83.2
2376
+ - type: recall_at_1
2377
+ value: 0.22
2378
+ - type: recall_at_10
2379
+ value: 2.078
2380
+ - type: recall_at_100
2381
+ value: 13.297
2382
+ - type: recall_at_1000
2383
+ value: 44.979
2384
+ - type: recall_at_3
2385
+ value: 0.6689999999999999
2386
+ - type: recall_at_5
2387
+ value: 1.106
2388
+ - task:
2389
+ type: Retrieval
2390
+ dataset:
2391
+ type: webis-touche2020
2392
+ name: MTEB Touche2020
2393
+ config: default
2394
+ split: test
2395
+ revision: None
2396
+ metrics:
2397
+ - type: map_at_1
2398
+ value: 2.258
2399
+ - type: map_at_10
2400
+ value: 10.439
2401
+ - type: map_at_100
2402
+ value: 16.89
2403
+ - type: map_at_1000
2404
+ value: 18.407999999999998
2405
+ - type: map_at_3
2406
+ value: 5.668
2407
+ - type: map_at_5
2408
+ value: 7.718
2409
+ - type: mrr_at_1
2410
+ value: 32.653
2411
+ - type: mrr_at_10
2412
+ value: 51.159
2413
+ - type: mrr_at_100
2414
+ value: 51.714000000000006
2415
+ - type: mrr_at_1000
2416
+ value: 51.714000000000006
2417
+ - type: mrr_at_3
2418
+ value: 47.959
2419
+ - type: mrr_at_5
2420
+ value: 50.407999999999994
2421
+ - type: ndcg_at_1
2422
+ value: 29.592000000000002
2423
+ - type: ndcg_at_10
2424
+ value: 26.037
2425
+ - type: ndcg_at_100
2426
+ value: 37.924
2427
+ - type: ndcg_at_1000
2428
+ value: 49.126999999999995
2429
+ - type: ndcg_at_3
2430
+ value: 30.631999999999998
2431
+ - type: ndcg_at_5
2432
+ value: 28.571
2433
+ - type: precision_at_1
2434
+ value: 32.653
2435
+ - type: precision_at_10
2436
+ value: 22.857
2437
+ - type: precision_at_100
2438
+ value: 7.754999999999999
2439
+ - type: precision_at_1000
2440
+ value: 1.529
2441
+ - type: precision_at_3
2442
+ value: 34.014
2443
+ - type: precision_at_5
2444
+ value: 29.796
2445
+ - type: recall_at_1
2446
+ value: 2.258
2447
+ - type: recall_at_10
2448
+ value: 16.554
2449
+ - type: recall_at_100
2450
+ value: 48.439
2451
+ - type: recall_at_1000
2452
+ value: 82.80499999999999
2453
+ - type: recall_at_3
2454
+ value: 7.283
2455
+ - type: recall_at_5
2456
+ value: 10.732
2457
+ - task:
2458
+ type: Classification
2459
+ dataset:
2460
+ type: mteb/toxic_conversations_50k
2461
+ name: MTEB ToxicConversationsClassification
2462
+ config: default
2463
+ split: test
2464
+ revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c
2465
+ metrics:
2466
+ - type: accuracy
2467
+ value: 69.8858
2468
+ - type: ap
2469
+ value: 13.835684144362109
2470
+ - type: f1
2471
+ value: 53.803351693244586
2472
+ - task:
2473
+ type: Classification
2474
+ dataset:
2475
+ type: mteb/tweet_sentiment_extraction
2476
+ name: MTEB TweetSentimentExtractionClassification
2477
+ config: default
2478
+ split: test
2479
+ revision: d604517c81ca91fe16a244d1248fc021f9ecee7a
2480
+ metrics:
2481
+ - type: accuracy
2482
+ value: 60.50650820599886
2483
+ - type: f1
2484
+ value: 60.84357825979259
2485
+ - task:
2486
+ type: Clustering
2487
+ dataset:
2488
+ type: mteb/twentynewsgroups-clustering
2489
+ name: MTEB TwentyNewsgroupsClustering
2490
+ config: default
2491
+ split: test
2492
+ revision: 6125ec4e24fa026cec8a478383ee943acfbd5449
2493
+ metrics:
2494
+ - type: v_measure
2495
+ value: 48.52131044852134
2496
+ - task:
2497
+ type: PairClassification
2498
+ dataset:
2499
+ type: mteb/twittersemeval2015-pairclassification
2500
+ name: MTEB TwitterSemEval2015
2501
+ config: default
2502
+ split: test
2503
+ revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1
2504
+ metrics:
2505
+ - type: cos_sim_accuracy
2506
+ value: 85.59337187816654
2507
+ - type: cos_sim_ap
2508
+ value: 73.23925826533437
2509
+ - type: cos_sim_f1
2510
+ value: 67.34693877551021
2511
+ - type: cos_sim_precision
2512
+ value: 62.40432237730752
2513
+ - type: cos_sim_recall
2514
+ value: 73.13984168865434
2515
+ - type: dot_accuracy
2516
+ value: 85.31322644096085
2517
+ - type: dot_ap
2518
+ value: 72.30723963807422
2519
+ - type: dot_f1
2520
+ value: 66.47051612112296
2521
+ - type: dot_precision
2522
+ value: 62.0792305930845
2523
+ - type: dot_recall
2524
+ value: 71.53034300791556
2525
+ - type: euclidean_accuracy
2526
+ value: 85.61125350181797
2527
+ - type: euclidean_ap
2528
+ value: 73.32843720487845
2529
+ - type: euclidean_f1
2530
+ value: 67.36549633745895
2531
+ - type: euclidean_precision
2532
+ value: 64.60755813953489
2533
+ - type: euclidean_recall
2534
+ value: 70.36939313984169
2535
+ - type: manhattan_accuracy
2536
+ value: 85.63509566668654
2537
+ - type: manhattan_ap
2538
+ value: 73.16658488311325
2539
+ - type: manhattan_f1
2540
+ value: 67.20597386434349
2541
+ - type: manhattan_precision
2542
+ value: 63.60424028268551
2543
+ - type: manhattan_recall
2544
+ value: 71.2401055408971
2545
+ - type: max_accuracy
2546
+ value: 85.63509566668654
2547
+ - type: max_ap
2548
+ value: 73.32843720487845
2549
+ - type: max_f1
2550
+ value: 67.36549633745895
2551
+ - task:
2552
+ type: PairClassification
2553
+ dataset:
2554
+ type: mteb/twitterurlcorpus-pairclassification
2555
+ name: MTEB TwitterURLCorpus
2556
+ config: default
2557
+ split: test
2558
+ revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf
2559
+ metrics:
2560
+ - type: cos_sim_accuracy
2561
+ value: 88.33779640625606
2562
+ - type: cos_sim_ap
2563
+ value: 84.83868375898157
2564
+ - type: cos_sim_f1
2565
+ value: 77.16506154017773
2566
+ - type: cos_sim_precision
2567
+ value: 74.62064005753327
2568
+ - type: cos_sim_recall
2569
+ value: 79.88912842623961
2570
+ - type: dot_accuracy
2571
+ value: 88.02732176815307
2572
+ - type: dot_ap
2573
+ value: 83.95089283763002
2574
+ - type: dot_f1
2575
+ value: 76.29635101196631
2576
+ - type: dot_precision
2577
+ value: 73.31771720613288
2578
+ - type: dot_recall
2579
+ value: 79.52725592854944
2580
+ - type: euclidean_accuracy
2581
+ value: 88.44452206310397
2582
+ - type: euclidean_ap
2583
+ value: 84.98384576824827
2584
+ - type: euclidean_f1
2585
+ value: 77.29311047696697
2586
+ - type: euclidean_precision
2587
+ value: 74.51232583065381
2588
+ - type: euclidean_recall
2589
+ value: 80.28949799815214
2590
+ - type: manhattan_accuracy
2591
+ value: 88.47362906042613
2592
+ - type: manhattan_ap
2593
+ value: 84.91421462218432
2594
+ - type: manhattan_f1
2595
+ value: 77.05107637204792
2596
+ - type: manhattan_precision
2597
+ value: 74.74484256243214
2598
+ - type: manhattan_recall
2599
+ value: 79.50415768401602
2600
+ - type: max_accuracy
2601
+ value: 88.47362906042613
2602
+ - type: max_ap
2603
+ value: 84.98384576824827
2604
+ - type: max_f1
2605
+ value: 77.29311047696697
2606
+ license: mit
2607
+ language:
2608
+ - en
2609
+ ---
2610
+ # # Fast-Inference with Ctranslate2
2611
+ Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on CPU or GPU.
2612
+
2613
+ quantized version of [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5)
2614
+ ```bash
2615
+ pip install hf-hub-ctranslate2>=2.12.0 ctranslate2>=3.17.1
2616
+ ```
2617
+
2618
+ ```python
2619
+ # from transformers import AutoTokenizer
2620
+ model_name = "michaelfeil/ct2fast-bge-small-en-v1.5"
2621
+ model_name_orig="BAAI/bge-small-en-v1.5"
2622
+
2623
+ from hf_hub_ctranslate2 import EncoderCT2fromHfHub
2624
+ model = EncoderCT2fromHfHub(
2625
+ # load in int8 on CUDA
2626
+ model_name_or_path=model_name,
2627
+ device="cuda",
2628
+ compute_type="int8_float16"
2629
+ )
2630
+ outputs = model.generate(
2631
+ text=["I like soccer", "I like tennis", "The eiffel tower is in Paris"],
2632
+ max_length=64,
2633
+ ) # perform downstream tasks on outputs
2634
+ outputs["pooler_output"]
2635
+ outputs["last_hidden_state"]
2636
+ outputs["attention_mask"]
2637
+
2638
+ # alternative, use SentenceTransformer Mix-In
2639
+ # for end-to-end Sentence embeddings generation
2640
+ # (not pulling from this CT2fast-HF repo)
2641
+
2642
+ from hf_hub_ctranslate2 import CT2SentenceTransformer
2643
+ model = CT2SentenceTransformer(
2644
+ model_name_orig, compute_type="int8_float16", device="cuda"
2645
+ )
2646
+ embeddings = model.encode(
2647
+ ["I like soccer", "I like tennis", "The eiffel tower is in Paris"],
2648
+ batch_size=32,
2649
+ convert_to_numpy=True,
2650
+ normalize_embeddings=True,
2651
+ )
2652
+ print(embeddings.shape, embeddings)
2653
+ scores = (embeddings @ embeddings.T) * 100
2654
+
2655
+ # Hint: you can also host this code via REST API and
2656
+ # via github.com/michaelfeil/infinity
2657
+
2658
+
2659
+ ```
2660
+
2661
+ Checkpoint compatible to [ctranslate2>=3.17.1](https://github.com/OpenNMT/CTranslate2)
2662
+ and [hf-hub-ctranslate2>=2.12.0](https://github.com/michaelfeil/hf-hub-ctranslate2)
2663
+ - `compute_type=int8_float16` for `device="cuda"`
2664
+ - `compute_type=int8` for `device="cpu"`
2665
+
2666
+ Converted on 2023-10-13 using
2667
+ ```
2668
+ LLama-2 -> removed <pad> token.
2669
+ ```
2670
+
2671
+ # Licence and other remarks:
2672
+ This is just a quantized version. Licence conditions are intended to be idential to original huggingface repo.
2673
+
2674
+ # Original description
2675
+
2676
+
2677
+
2678
+ <h1 align="center">FlagEmbedding</h1>
2679
+
2680
+
2681
+ <h4 align="center">
2682
+ <p>
2683
+ <a href=#model-list>Model List</a> |
2684
+ <a href=#frequently-asked-questions>FAQ</a> |
2685
+ <a href=#usage>Usage</a> |
2686
+ <a href="#evaluation">Evaluation</a> |
2687
+ <a href="#train">Train</a> |
2688
+ <a href="#contact">Contact</a> |
2689
+ <a href="#citation">Citation</a> |
2690
+ <a href="#license">License</a>
2691
+ <p>
2692
+ </h4>
2693
+
2694
+ More details please refer to our Github: [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding).
2695
+
2696
+
2697
+ [English](README.md) | [中文](https://github.com/FlagOpen/FlagEmbedding/blob/master/README_zh.md)
2698
+
2699
+ FlagEmbedding can map any text to a low-dimensional dense vector which can be used for tasks like retrieval, classification, clustering, or semantic search.
2700
+ And it also can be used in vector databases for LLMs.
2701
+
2702
+ ************* 🌟**Updates**🌟 *************
2703
+ - 10/12/2023: Release [LLM-Embedder](./FlagEmbedding/llm_embedder/README.md), a unified embedding model to support diverse retrieval augmentation needs for LLMs. [Paper](https://arxiv.org/pdf/2310.07554.pdf) :fire:
2704
+ - 09/15/2023: The [technical report](https://arxiv.org/pdf/2309.07597.pdf) of BGE has been released
2705
+ - 09/15/2023: The [masive training data](https://data.baai.ac.cn/details/BAAI-MTP) of BGE has been released
2706
+ - 09/12/2023: New models:
2707
+ - **New reranker model**: release cross-encoder models `BAAI/bge-reranker-base` and `BAAI/bge-reranker-large`, which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models.
2708
+ - **update embedding model**: release `bge-*-v1.5` embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction.
2709
+
2710
+
2711
+ <details>
2712
+ <summary>More</summary>
2713
+ <!-- ### More -->
2714
+
2715
+ - 09/07/2023: Update [fine-tune code](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/README.md): Add script to mine hard negatives and support adding instruction during fine-tuning.
2716
+ - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like [this](#using-langchain); C-MTEB **leaderboard** is [available](https://huggingface.co/spaces/mteb/leaderboard).
2717
+ - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗**
2718
+ - 08/02/2023: Release `bge-large-*`(short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada:
2719
+ - 08/01/2023: We release the [Chinese Massive Text Embedding Benchmark](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB) (**C-MTEB**), consisting of 31 test dataset.
2720
+
2721
+ </details>
2722
+
2723
+
2724
+ ## Model List
2725
+
2726
+ `bge` is short for `BAAI general embedding`.
2727
+
2728
+ | Model | Language | | Description | query instruction for retrieval [1] |
2729
+ |:-------------------------------|:--------:| :--------:| :--------:|:--------:|
2730
+ | [BAAI/llm-embedder](https://huggingface.co/BAAI/llm-embedder) | English | [Inference](./FlagEmbedding/llm_embedder/README.md) [Fine-tune](./FlagEmbedding/llm_embedder/README.md) | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See [README](./FlagEmbedding/llm_embedder/README.md) |
2731
+ | [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) | Chinese and English | [Inference](#usage-for-reranker) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) | a cross-encoder model which is more accurate but less efficient [2] | |
2732
+ | [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) | Chinese and English | [Inference](#usage-for-reranker) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) | a cross-encoder model which is more accurate but less efficient [2] | |
2733
+ | [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `Represent this sentence for searching relevant passages: ` |
2734
+ | [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `Represent this sentence for searching relevant passages: ` |
2735
+ | [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `Represent this sentence for searching relevant passages: ` |
2736
+ | [BAAI/bge-large-zh-v1.5](https://huggingface.co/BAAI/bge-large-zh-v1.5) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `为这个句子生成表示以用于检索相关文章:` |
2737
+ | [BAAI/bge-base-zh-v1.5](https://huggingface.co/BAAI/bge-base-zh-v1.5) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `为这个句子生成表示以用于检索相关文章:` |
2738
+ | [BAAI/bge-small-zh-v1.5](https://huggingface.co/BAAI/bge-small-zh-v1.5) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `为这个句子生成表示以用于检索相关文章:` |
2739
+ | [BAAI/bge-large-en](https://huggingface.co/BAAI/bge-large-en) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | :trophy: rank **1st** in [MTEB](https://huggingface.co/spaces/mteb/leaderboard) leaderboard | `Represent this sentence for searching relevant passages: ` |
2740
+ | [BAAI/bge-base-en](https://huggingface.co/BAAI/bge-base-en) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | a base-scale model but with similar ability to `bge-large-en` | `Represent this sentence for searching relevant passages: ` |
2741
+ | [BAAI/bge-small-en](https://huggingface.co/BAAI/bge-small-en) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) |a small-scale model but with competitive performance | `Represent this sentence for searching relevant passages: ` |
2742
+ | [BAAI/bge-large-zh](https://huggingface.co/BAAI/bge-large-zh) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | :trophy: rank **1st** in [C-MTEB](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB) benchmark | `为这个句子生成表示以用于检索相关文章:` |
2743
+ | [BAAI/bge-base-zh](https://huggingface.co/BAAI/bge-base-zh) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | a base-scale model but with similar ability to `bge-large-zh` | `为这个句子生成表示以用于检索相关文章:` |
2744
+ | [BAAI/bge-small-zh](https://huggingface.co/BAAI/bge-small-zh) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | a small-scale model but with competitive performance | `为这个句子生成表示以用于检索相关文章:` |
2745
+
2746
+
2747
+ [1\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages.
2748
+
2749
+ [2\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models.
2750
+ For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results.
2751
+
2752
+ All models have been uploaded to Huggingface Hub, and you can see them at https://huggingface.co/BAAI.
2753
+ If you cannot open the Huggingface Hub, you also can download the models at https://model.baai.ac.cn/models .
2754
+
2755
+
2756
+ ## Frequently asked questions
2757
+
2758
+ <details>
2759
+ <summary>1. How to fine-tune bge embedding model?</summary>
2760
+
2761
+ <!-- ### How to fine-tune bge embedding model? -->
2762
+ Following this [example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) to prepare data and fine-tune your model.
2763
+ Some suggestions:
2764
+ - Mine hard negatives following this [example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune#hard-negatives), which can improve the retrieval performance.
2765
+ - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity.
2766
+ - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker.
2767
+
2768
+
2769
+ </details>
2770
+
2771
+ <details>
2772
+ <summary>2. The similarity score between two dissimilar sentences is higher than 0.5</summary>
2773
+
2774
+ <!-- ### The similarity score between two dissimilar sentences is higher than 0.5 -->
2775
+ **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.**
2776
+
2777
+ Since we finetune the models by contrastive learning with a temperature of 0.01,
2778
+ the similarity distribution of the current BGE model is about in the interval \[0.6, 1\].
2779
+ So a similarity score greater than 0.5 does not indicate that the two sentences are similar.
2780
+
2781
+ For downstream tasks, such as passage retrieval or semantic similarity,
2782
+ **what matters is the relative order of the scores, not the absolute value.**
2783
+ If you need to filter similar sentences based on a similarity threshold,
2784
+ please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9).
2785
+
2786
+ </details>
2787
+
2788
+ <details>
2789
+ <summary>3. When does the query instruction need to be used</summary>
2790
+
2791
+ <!-- ### When does the query instruction need to be used -->
2792
+
2793
+ For the `bge-*-v1.5`, we improve its retrieval ability when not using instruction.
2794
+ No instruction only has a slight degradation in retrieval performance compared with using instruction.
2795
+ So you can generate embedding without instruction in all cases for convenience.
2796
+
2797
+ For a retrieval task that uses short queries to find long related documents,
2798
+ it is recommended to add instructions for these short queries.
2799
+ **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.**
2800
+ In all cases, the documents/passages do not need to add the instruction.
2801
+
2802
+ </details>
2803
+
2804
+
2805
+ ## Usage
2806
+
2807
+ ### Usage for Embedding Model
2808
+
2809
+ Here are some examples for using `bge` models with
2810
+ [FlagEmbedding](#using-flagembedding), [Sentence-Transformers](#using-sentence-transformers), [Langchain](#using-langchain), or [Huggingface Transformers](#using-huggingface-transformers).
2811
+
2812
+ #### Using FlagEmbedding
2813
+ ```
2814
+ pip install -U FlagEmbedding
2815
+ ```
2816
+ If it doesn't work for you, you can see [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/README.md) for more methods to install FlagEmbedding.
2817
+
2818
+ ```python
2819
+ from FlagEmbedding import FlagModel
2820
+ sentences_1 = ["样例数据-1", "样例数据-2"]
2821
+ sentences_2 = ["样例数据-3", "样例数据-4"]
2822
+ model = FlagModel('BAAI/bge-large-zh-v1.5',
2823
+ query_instruction_for_retrieval="为这个句子生成表示以用于检索相关文章:",
2824
+ use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation
2825
+ embeddings_1 = model.encode(sentences_1)
2826
+ embeddings_2 = model.encode(sentences_2)
2827
+ similarity = embeddings_1 @ embeddings_2.T
2828
+ print(similarity)
2829
+
2830
+ # for s2p(short query to long passage) retrieval task, suggest to use encode_queries() which will automatically add the instruction to each query
2831
+ # corpus in retrieval task can still use encode() or encode_corpus(), since they don't need instruction
2832
+ queries = ['query_1', 'query_2']
2833
+ passages = ["样例文档-1", "样例文档-2"]
2834
+ q_embeddings = model.encode_queries(queries)
2835
+ p_embeddings = model.encode(passages)
2836
+ scores = q_embeddings @ p_embeddings.T
2837
+ ```
2838
+ For the value of the argument `query_instruction_for_retrieval`, see [Model List](https://github.com/FlagOpen/FlagEmbedding/tree/master#model-list).
2839
+
2840
+ By default, FlagModel will use all available GPUs when encoding. Please set `os.environ["CUDA_VISIBLE_DEVICES"]` to select specific GPUs.
2841
+ You also can set `os.environ["CUDA_VISIBLE_DEVICES"]=""` to make all GPUs unavailable.
2842
+
2843
+
2844
+ #### Using Sentence-Transformers
2845
+
2846
+ You can also use the `bge` models with [sentence-transformers](https://www.SBERT.net):
2847
+
2848
+ ```
2849
+ pip install -U sentence-transformers
2850
+ ```
2851
+ ```python
2852
+ from sentence_transformers import SentenceTransformer
2853
+ sentences_1 = ["样例数据-1", "样例数据-2"]
2854
+ sentences_2 = ["样例数据-3", "样例数据-4"]
2855
+ model = SentenceTransformer('BAAI/bge-large-zh-v1.5')
2856
+ embeddings_1 = model.encode(sentences_1, normalize_embeddings=True)
2857
+ embeddings_2 = model.encode(sentences_2, normalize_embeddings=True)
2858
+ similarity = embeddings_1 @ embeddings_2.T
2859
+ print(similarity)
2860
+ ```
2861
+ For s2p(short query to long passage) retrieval task,
2862
+ each short query should start with an instruction (instructions see [Model List](https://github.com/FlagOpen/FlagEmbedding/tree/master#model-list)).
2863
+ But the instruction is not needed for passages.
2864
+ ```python
2865
+ from sentence_transformers import SentenceTransformer
2866
+ queries = ['query_1', 'query_2']
2867
+ passages = ["样例文档-1", "样例文档-2"]
2868
+ instruction = "为这个句子生成表示以用于检索相关文章:"
2869
+
2870
+ model = SentenceTransformer('BAAI/bge-large-zh-v1.5')
2871
+ q_embeddings = model.encode([instruction+q for q in queries], normalize_embeddings=True)
2872
+ p_embeddings = model.encode(passages, normalize_embeddings=True)
2873
+ scores = q_embeddings @ p_embeddings.T
2874
+ ```
2875
+
2876
+ #### Using Langchain
2877
+
2878
+ You can use `bge` in langchain like this:
2879
+ ```python
2880
+ from langchain.embeddings import HuggingFaceBgeEmbeddings
2881
+ model_name = "BAAI/bge-large-en-v1.5"
2882
+ model_kwargs = {'device': 'cuda'}
2883
+ encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity
2884
+ model = HuggingFaceBgeEmbeddings(
2885
+ model_name=model_name,
2886
+ model_kwargs=model_kwargs,
2887
+ encode_kwargs=encode_kwargs,
2888
+ query_instruction="为这个句子生成表示以用于检索相关文章:"
2889
+ )
2890
+ model.query_instruction = "为这个句子生成表示以用于检索相关文章:"
2891
+ ```
2892
+
2893
+
2894
+ #### Using HuggingFace Transformers
2895
+
2896
+ With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding.
2897
+
2898
+ ```python
2899
+ from transformers import AutoTokenizer, AutoModel
2900
+ import torch
2901
+ # Sentences we want sentence embeddings for
2902
+ sentences = ["样例数据-1", "样例数据-2"]
2903
+
2904
+ # Load model from HuggingFace Hub
2905
+ tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-large-zh-v1.5')
2906
+ model = AutoModel.from_pretrained('BAAI/bge-large-zh-v1.5')
2907
+ model.eval()
2908
+
2909
+ # Tokenize sentences
2910
+ encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
2911
+ # for s2p(short query to long passage) retrieval task, add an instruction to query (not add instruction for passages)
2912
+ # encoded_input = tokenizer([instruction + q for q in queries], padding=True, truncation=True, return_tensors='pt')
2913
+
2914
+ # Compute token embeddings
2915
+ with torch.no_grad():
2916
+ model_output = model(**encoded_input)
2917
+ # Perform pooling. In this case, cls pooling.
2918
+ sentence_embeddings = model_output[0][:, 0]
2919
+ # normalize embeddings
2920
+ sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1)
2921
+ print("Sentence embeddings:", sentence_embeddings)
2922
+ ```
2923
+
2924
+ ### Usage for Reranker
2925
+
2926
+ Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding.
2927
+ You can get a relevance score by inputting query and passage to the reranker.
2928
+ The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range.
2929
+
2930
+
2931
+ #### Using FlagEmbedding
2932
+ ```
2933
+ pip install -U FlagEmbedding
2934
+ ```
2935
+
2936
+ Get relevance scores (higher scores indicate more relevance):
2937
+ ```python
2938
+ from FlagEmbedding import FlagReranker
2939
+ reranker = FlagReranker('BAAI/bge-reranker-large', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation
2940
+
2941
+ score = reranker.compute_score(['query', 'passage'])
2942
+ print(score)
2943
+
2944
+ scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']])
2945
+ print(scores)
2946
+ ```
2947
+
2948
+
2949
+ #### Using Huggingface transformers
2950
+
2951
+ ```python
2952
+ import torch
2953
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
2954
+
2955
+ tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-large')
2956
+ model = AutoModelForSequenceClassification.from_pretrained('BAAI/bge-reranker-large')
2957
+ model.eval()
2958
+
2959
+ pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]
2960
+ with torch.no_grad():
2961
+ inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)
2962
+ scores = model(**inputs, return_dict=True).logits.view(-1, ).float()
2963
+ print(scores)
2964
+ ```
2965
+
2966
+ ## Evaluation
2967
+
2968
+ `baai-general-embedding` models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!**
2969
+ For more details and evaluation tools see our [scripts](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB/README.md).
2970
+
2971
+ - **MTEB**:
2972
+
2973
+ | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) |
2974
+ |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
2975
+ | [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 |
2976
+ | [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 |
2977
+ | [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 |
2978
+ | [bge-large-en](https://huggingface.co/BAAI/bge-large-en) | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 |
2979
+ | [bge-base-en](https://huggingface.co/BAAI/bge-base-en) | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 |
2980
+ | [gte-large](https://huggingface.co/thenlper/gte-large) | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 |
2981
+ | [gte-base](https://huggingface.co/thenlper/gte-base) | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 |
2982
+ | [e5-large-v2](https://huggingface.co/intfloat/e5-large-v2) | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 |
2983
+ | [bge-small-en](https://huggingface.co/BAAI/bge-small-en) | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 |
2984
+ | [instructor-xl](https://huggingface.co/hkunlp/instructor-xl) | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 |
2985
+ | [e5-base-v2](https://huggingface.co/intfloat/e5-base-v2) | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 |
2986
+ | [gte-small](https://huggingface.co/thenlper/gte-small) | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 |
2987
+ | [text-embedding-ada-002](https://platform.openai.com/docs/guides/embeddings) | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 |
2988
+ | [e5-small-v2](https://huggingface.co/intfloat/e5-base-v2) | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 |
2989
+ | [sentence-t5-xxl](https://huggingface.co/sentence-transformers/sentence-t5-xxl) | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 |
2990
+ | [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 |
2991
+ | [sgpt-bloom-7b1-msmarco](https://huggingface.co/bigscience/sgpt-bloom-7b1-msmarco) | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 |
2992
+
2993
+
2994
+
2995
+ - **C-MTEB**:
2996
+ We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks.
2997
+ Please refer to [C_MTEB](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB/README.md) for a detailed introduction.
2998
+
2999
+ | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering |
3000
+ |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|
3001
+ | [**BAAI/bge-large-zh-v1.5**](https://huggingface.co/BAAI/bge-large-zh-v1.5) | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 |
3002
+ | [BAAI/bge-base-zh-v1.5](https://huggingface.co/BAAI/bge-base-zh-v1.5) | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 |
3003
+ | [BAAI/bge-small-zh-v1.5](https://huggingface.co/BAAI/bge-small-zh-v1.5) | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 |
3004
+ | [BAAI/bge-large-zh](https://huggingface.co/BAAI/bge-large-zh) | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 |
3005
+ | [bge-large-zh-noinstruct](https://huggingface.co/BAAI/bge-large-zh-noinstruct) | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 |
3006
+ | [BAAI/bge-base-zh](https://huggingface.co/BAAI/bge-base-zh) | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 |
3007
+ | [multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large) | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 |
3008
+ | [BAAI/bge-small-zh](https://huggingface.co/BAAI/bge-small-zh) | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 |
3009
+ | [m3e-base](https://huggingface.co/moka-ai/m3e-base) | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 |
3010
+ | [m3e-large](https://huggingface.co/moka-ai/m3e-large) | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 |
3011
+ | [multilingual-e5-base](https://huggingface.co/intfloat/multilingual-e5-base) | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 |
3012
+ | [multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small) | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 |
3013
+ | [text-embedding-ada-002(OpenAI)](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 |
3014
+ | [luotuo](https://huggingface.co/silk-road/luotuo-bert-medium) | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 |
3015
+ | [text2vec-base](https://huggingface.co/shibing624/text2vec-base-chinese) | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 |
3016
+ | [text2vec-large](https://huggingface.co/GanymedeNil/text2vec-large-chinese) | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 |
3017
+
3018
+
3019
+ - **Reranking**:
3020
+ See [C_MTEB](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB/) for evaluation script.
3021
+
3022
+ | Model | T2Reranking | T2RerankingZh2En\* | T2RerankingEn2Zh\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg |
3023
+ |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|
3024
+ | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 |
3025
+ | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 |
3026
+ | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 |
3027
+ | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 |
3028
+ | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 |
3029
+ | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 |
3030
+ | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 |
3031
+ | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 |
3032
+ | [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 |
3033
+ | [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 |
3034
+
3035
+ \* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks
3036
+
3037
+ ## Train
3038
+
3039
+ ### BAAI Embedding
3040
+
3041
+ We pre-train the models using [retromae](https://github.com/staoxiao/RetroMAE) and train them on large-scale pairs data using contrastive learning.
3042
+ **You can fine-tune the embedding model on your data following our [examples](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune).**
3043
+ We also provide a [pre-train example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/pretrain).
3044
+ Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned.
3045
+ More training details for bge see [baai_general_embedding](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/README.md).
3046
+
3047
+
3048
+
3049
+ ### BGE Reranker
3050
+
3051
+ Cross-encoder will perform full-attention over the input pair,
3052
+ which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model.
3053
+ Therefore, it can be used to re-rank the top-k documents returned by embedding model.
3054
+ We train the cross-encoder on a multilingual pair data,
3055
+ The data format is the same as embedding model, so you can fine-tune it easily following our [example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker).
3056
+ More details please refer to [./FlagEmbedding/reranker/README.md](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/reranker)
3057
+
3058
+
3059
+ ## Contact
3060
+ If you have any question or suggestion related to this project, feel free to open an issue or pull request.
3061
+ You also can email Shitao Xiao(stxiao@baai.ac.cn) and Zheng Liu(liuzheng@baai.ac.cn).
3062
+
3063
+
3064
+ ## Citation
3065
+
3066
+ If you find this repository useful, please consider giving a star :star: and citation
3067
+
3068
+ ```
3069
+ @misc{bge_embedding,
3070
+ title={C-Pack: Packaged Resources To Advance General Chinese Embedding},
3071
+ author={Shitao Xiao and Zheng Liu and Peitian Zhang and Niklas Muennighoff},
3072
+ year={2023},
3073
+ eprint={2309.07597},
3074
+ archivePrefix={arXiv},
3075
+ primaryClass={cs.CL}
3076
+ }
3077
+ ```
3078
+
3079
+ ## License
3080
+ FlagEmbedding is licensed under the [MIT License](https://github.com/FlagOpen/FlagEmbedding/blob/master/LICENSE). The released models can be used for commercial purposes free of charge.
3081
+
config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/root/.cache/torch/sentence_transformers/BAAI_bge-small-en/",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "id2label": {
12
+ "0": "LABEL_0"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 1536,
16
+ "label2id": {
17
+ "LABEL_0": 0
18
+ },
19
+ "layer_norm_eps": 1e-12,
20
+ "max_position_embeddings": 512,
21
+ "model_type": "bert",
22
+ "num_attention_heads": 12,
23
+ "num_hidden_layers": 12,
24
+ "pad_token_id": 0,
25
+ "position_embedding_type": "absolute",
26
+ "torch_dtype": "float32",
27
+ "transformers_version": "4.30.0",
28
+ "type_vocab_size": 2,
29
+ "use_cache": true,
30
+ "vocab_size": 30522,
31
+ "bos_token": "<s>",
32
+ "eos_token": "</s>",
33
+ "layer_norm_epsilon": 1e-12,
34
+ "unk_token": "[UNK]"
35
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "2.2.2",
4
+ "transformers": "4.28.1",
5
+ "pytorch": "1.13.0+cu117"
6
+ }
7
+ }
model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3d6181a7549b51c876d1daadd4e0ef6eef73e9eea24cac51b738993b4f6899ea
3
+ size 66728364
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "clean_up_tokenization_spaces": true,
3
+ "cls_token": "[CLS]",
4
+ "do_basic_tokenize": true,
5
+ "do_lower_case": true,
6
+ "mask_token": "[MASK]",
7
+ "model_max_length": 1000000000000000019884624838656,
8
+ "never_split": null,
9
+ "pad_token": "[PAD]",
10
+ "sep_token": "[SEP]",
11
+ "strip_accents": null,
12
+ "tokenize_chinese_chars": true,
13
+ "tokenizer_class": "BertTokenizer",
14
+ "unk_token": "[UNK]"
15
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
vocabulary.json ADDED
The diff for this file is too large to render. See raw diff
 
vocabulary.txt ADDED
The diff for this file is too large to render. See raw diff