infgrad commited on
Commit
c4f846f
1 Parent(s): 313a246

Upload 7 files

Browse files
Files changed (7) hide show
  1. README.md +2913 -0
  2. config.json +31 -0
  3. pytorch_model.bin +3 -0
  4. special_tokens_map.json +7 -0
  5. tokenizer.json +0 -0
  6. tokenizer_config.json +15 -0
  7. vocab.txt +0 -0
README.md CHANGED
@@ -1,3 +1,2916 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ tags:
3
+ - sentence-transformers
4
+ - feature-extraction
5
+ - sentence-similarity
6
+ - mteb
7
+ model-index:
8
+ - name: stella-base-en-v2
9
+ results:
10
+ - task:
11
+ type: Classification
12
+ dataset:
13
+ type: mteb/amazon_counterfactual
14
+ name: MTEB AmazonCounterfactualClassification (en)
15
+ config: en
16
+ split: test
17
+ revision: e8379541af4e31359cca9fbcf4b00f2671dba205
18
+ metrics:
19
+ - type: accuracy
20
+ value: 77.19402985074628
21
+ - type: ap
22
+ value: 40.43267503017359
23
+ - type: f1
24
+ value: 71.15585210518594
25
+ - task:
26
+ type: Classification
27
+ dataset:
28
+ type: mteb/amazon_polarity
29
+ name: MTEB AmazonPolarityClassification
30
+ config: default
31
+ split: test
32
+ revision: e2d317d38cd51312af73b3d32a06d1a08b442046
33
+ metrics:
34
+ - type: accuracy
35
+ value: 93.256675
36
+ - type: ap
37
+ value: 90.00824833079179
38
+ - type: f1
39
+ value: 93.2473146151734
40
+ - task:
41
+ type: Classification
42
+ dataset:
43
+ type: mteb/amazon_reviews_multi
44
+ name: MTEB AmazonReviewsClassification (en)
45
+ config: en
46
+ split: test
47
+ revision: 1399c76144fd37290681b995c656ef9b2e06e26d
48
+ metrics:
49
+ - type: accuracy
50
+ value: 49.612
51
+ - type: f1
52
+ value: 48.530785631574304
53
+ - task:
54
+ type: Retrieval
55
+ dataset:
56
+ type: arguana
57
+ name: MTEB ArguAna
58
+ config: default
59
+ split: test
60
+ revision: None
61
+ metrics:
62
+ - type: map_at_1
63
+ value: 37.411
64
+ - type: map_at_10
65
+ value: 52.673
66
+ - type: map_at_100
67
+ value: 53.410999999999994
68
+ - type: map_at_1000
69
+ value: 53.415
70
+ - type: map_at_3
71
+ value: 48.495
72
+ - type: map_at_5
73
+ value: 51.183
74
+ - type: mrr_at_1
75
+ value: 37.838
76
+ - type: mrr_at_10
77
+ value: 52.844
78
+ - type: mrr_at_100
79
+ value: 53.581999999999994
80
+ - type: mrr_at_1000
81
+ value: 53.586
82
+ - type: mrr_at_3
83
+ value: 48.672
84
+ - type: mrr_at_5
85
+ value: 51.272
86
+ - type: ndcg_at_1
87
+ value: 37.411
88
+ - type: ndcg_at_10
89
+ value: 60.626999999999995
90
+ - type: ndcg_at_100
91
+ value: 63.675000000000004
92
+ - type: ndcg_at_1000
93
+ value: 63.776999999999994
94
+ - type: ndcg_at_3
95
+ value: 52.148
96
+ - type: ndcg_at_5
97
+ value: 57.001999999999995
98
+ - type: precision_at_1
99
+ value: 37.411
100
+ - type: precision_at_10
101
+ value: 8.578
102
+ - type: precision_at_100
103
+ value: 0.989
104
+ - type: precision_at_1000
105
+ value: 0.1
106
+ - type: precision_at_3
107
+ value: 20.91
108
+ - type: precision_at_5
109
+ value: 14.908
110
+ - type: recall_at_1
111
+ value: 37.411
112
+ - type: recall_at_10
113
+ value: 85.775
114
+ - type: recall_at_100
115
+ value: 98.86200000000001
116
+ - type: recall_at_1000
117
+ value: 99.644
118
+ - type: recall_at_3
119
+ value: 62.731
120
+ - type: recall_at_5
121
+ value: 74.53800000000001
122
+ - task:
123
+ type: Clustering
124
+ dataset:
125
+ type: mteb/arxiv-clustering-p2p
126
+ name: MTEB ArxivClusteringP2P
127
+ config: default
128
+ split: test
129
+ revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d
130
+ metrics:
131
+ - type: v_measure
132
+ value: 47.24219029437865
133
+ - task:
134
+ type: Clustering
135
+ dataset:
136
+ type: mteb/arxiv-clustering-s2s
137
+ name: MTEB ArxivClusteringS2S
138
+ config: default
139
+ split: test
140
+ revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53
141
+ metrics:
142
+ - type: v_measure
143
+ value: 40.474604844291726
144
+ - task:
145
+ type: Reranking
146
+ dataset:
147
+ type: mteb/askubuntudupquestions-reranking
148
+ name: MTEB AskUbuntuDupQuestions
149
+ config: default
150
+ split: test
151
+ revision: 2000358ca161889fa9c082cb41daa8dcfb161a54
152
+ metrics:
153
+ - type: map
154
+ value: 62.720542706366054
155
+ - type: mrr
156
+ value: 75.59633733456448
157
+ - task:
158
+ type: STS
159
+ dataset:
160
+ type: mteb/biosses-sts
161
+ name: MTEB BIOSSES
162
+ config: default
163
+ split: test
164
+ revision: d3fb88f8f02e40887cd149695127462bbcf29b4a
165
+ metrics:
166
+ - type: cos_sim_pearson
167
+ value: 86.31345008397868
168
+ - type: cos_sim_spearman
169
+ value: 85.94292212320399
170
+ - type: euclidean_pearson
171
+ value: 85.03974302774525
172
+ - type: euclidean_spearman
173
+ value: 85.88087251659051
174
+ - type: manhattan_pearson
175
+ value: 84.91900996712951
176
+ - type: manhattan_spearman
177
+ value: 85.96701905781116
178
+ - task:
179
+ type: Classification
180
+ dataset:
181
+ type: mteb/banking77
182
+ name: MTEB Banking77Classification
183
+ config: default
184
+ split: test
185
+ revision: 0fd18e25b25c072e09e0d92ab615fda904d66300
186
+ metrics:
187
+ - type: accuracy
188
+ value: 84.72727272727273
189
+ - type: f1
190
+ value: 84.29572512364581
191
+ - task:
192
+ type: Clustering
193
+ dataset:
194
+ type: mteb/biorxiv-clustering-p2p
195
+ name: MTEB BiorxivClusteringP2P
196
+ config: default
197
+ split: test
198
+ revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40
199
+ metrics:
200
+ - type: v_measure
201
+ value: 39.55532460397536
202
+ - task:
203
+ type: Clustering
204
+ dataset:
205
+ type: mteb/biorxiv-clustering-s2s
206
+ name: MTEB BiorxivClusteringS2S
207
+ config: default
208
+ split: test
209
+ revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908
210
+ metrics:
211
+ - type: v_measure
212
+ value: 35.91195973591251
213
+ - task:
214
+ type: Retrieval
215
+ dataset:
216
+ type: BeIR/cqadupstack
217
+ name: MTEB CQADupstackAndroidRetrieval
218
+ config: default
219
+ split: test
220
+ revision: None
221
+ metrics:
222
+ - type: map_at_1
223
+ value: 32.822
224
+ - type: map_at_10
225
+ value: 44.139
226
+ - type: map_at_100
227
+ value: 45.786
228
+ - type: map_at_1000
229
+ value: 45.906000000000006
230
+ - type: map_at_3
231
+ value: 40.637
232
+ - type: map_at_5
233
+ value: 42.575
234
+ - type: mrr_at_1
235
+ value: 41.059
236
+ - type: mrr_at_10
237
+ value: 50.751000000000005
238
+ - type: mrr_at_100
239
+ value: 51.548
240
+ - type: mrr_at_1000
241
+ value: 51.583999999999996
242
+ - type: mrr_at_3
243
+ value: 48.236000000000004
244
+ - type: mrr_at_5
245
+ value: 49.838
246
+ - type: ndcg_at_1
247
+ value: 41.059
248
+ - type: ndcg_at_10
249
+ value: 50.573
250
+ - type: ndcg_at_100
251
+ value: 56.25
252
+ - type: ndcg_at_1000
253
+ value: 58.004
254
+ - type: ndcg_at_3
255
+ value: 45.995000000000005
256
+ - type: ndcg_at_5
257
+ value: 48.18
258
+ - type: precision_at_1
259
+ value: 41.059
260
+ - type: precision_at_10
261
+ value: 9.757
262
+ - type: precision_at_100
263
+ value: 1.609
264
+ - type: precision_at_1000
265
+ value: 0.20600000000000002
266
+ - type: precision_at_3
267
+ value: 22.222
268
+ - type: precision_at_5
269
+ value: 16.023
270
+ - type: recall_at_1
271
+ value: 32.822
272
+ - type: recall_at_10
273
+ value: 61.794000000000004
274
+ - type: recall_at_100
275
+ value: 85.64699999999999
276
+ - type: recall_at_1000
277
+ value: 96.836
278
+ - type: recall_at_3
279
+ value: 47.999
280
+ - type: recall_at_5
281
+ value: 54.376999999999995
282
+ - task:
283
+ type: Retrieval
284
+ dataset:
285
+ type: BeIR/cqadupstack
286
+ name: MTEB CQADupstackEnglishRetrieval
287
+ config: default
288
+ split: test
289
+ revision: None
290
+ metrics:
291
+ - type: map_at_1
292
+ value: 29.579
293
+ - type: map_at_10
294
+ value: 39.787
295
+ - type: map_at_100
296
+ value: 40.976
297
+ - type: map_at_1000
298
+ value: 41.108
299
+ - type: map_at_3
300
+ value: 36.819
301
+ - type: map_at_5
302
+ value: 38.437
303
+ - type: mrr_at_1
304
+ value: 37.516
305
+ - type: mrr_at_10
306
+ value: 45.822
307
+ - type: mrr_at_100
308
+ value: 46.454
309
+ - type: mrr_at_1000
310
+ value: 46.495999999999995
311
+ - type: mrr_at_3
312
+ value: 43.556
313
+ - type: mrr_at_5
314
+ value: 44.814
315
+ - type: ndcg_at_1
316
+ value: 37.516
317
+ - type: ndcg_at_10
318
+ value: 45.5
319
+ - type: ndcg_at_100
320
+ value: 49.707
321
+ - type: ndcg_at_1000
322
+ value: 51.842
323
+ - type: ndcg_at_3
324
+ value: 41.369
325
+ - type: ndcg_at_5
326
+ value: 43.161
327
+ - type: precision_at_1
328
+ value: 37.516
329
+ - type: precision_at_10
330
+ value: 8.713
331
+ - type: precision_at_100
332
+ value: 1.38
333
+ - type: precision_at_1000
334
+ value: 0.188
335
+ - type: precision_at_3
336
+ value: 20.233999999999998
337
+ - type: precision_at_5
338
+ value: 14.280000000000001
339
+ - type: recall_at_1
340
+ value: 29.579
341
+ - type: recall_at_10
342
+ value: 55.458
343
+ - type: recall_at_100
344
+ value: 73.49799999999999
345
+ - type: recall_at_1000
346
+ value: 87.08200000000001
347
+ - type: recall_at_3
348
+ value: 42.858000000000004
349
+ - type: recall_at_5
350
+ value: 48.215
351
+ - task:
352
+ type: Retrieval
353
+ dataset:
354
+ type: BeIR/cqadupstack
355
+ name: MTEB CQADupstackGamingRetrieval
356
+ config: default
357
+ split: test
358
+ revision: None
359
+ metrics:
360
+ - type: map_at_1
361
+ value: 40.489999999999995
362
+ - type: map_at_10
363
+ value: 53.313
364
+ - type: map_at_100
365
+ value: 54.290000000000006
366
+ - type: map_at_1000
367
+ value: 54.346000000000004
368
+ - type: map_at_3
369
+ value: 49.983
370
+ - type: map_at_5
371
+ value: 51.867
372
+ - type: mrr_at_1
373
+ value: 46.27
374
+ - type: mrr_at_10
375
+ value: 56.660999999999994
376
+ - type: mrr_at_100
377
+ value: 57.274
378
+ - type: mrr_at_1000
379
+ value: 57.301
380
+ - type: mrr_at_3
381
+ value: 54.138
382
+ - type: mrr_at_5
383
+ value: 55.623999999999995
384
+ - type: ndcg_at_1
385
+ value: 46.27
386
+ - type: ndcg_at_10
387
+ value: 59.192
388
+ - type: ndcg_at_100
389
+ value: 63.026
390
+ - type: ndcg_at_1000
391
+ value: 64.079
392
+ - type: ndcg_at_3
393
+ value: 53.656000000000006
394
+ - type: ndcg_at_5
395
+ value: 56.387
396
+ - type: precision_at_1
397
+ value: 46.27
398
+ - type: precision_at_10
399
+ value: 9.511
400
+ - type: precision_at_100
401
+ value: 1.23
402
+ - type: precision_at_1000
403
+ value: 0.136
404
+ - type: precision_at_3
405
+ value: 24.096
406
+ - type: precision_at_5
407
+ value: 16.476
408
+ - type: recall_at_1
409
+ value: 40.489999999999995
410
+ - type: recall_at_10
411
+ value: 73.148
412
+ - type: recall_at_100
413
+ value: 89.723
414
+ - type: recall_at_1000
415
+ value: 97.073
416
+ - type: recall_at_3
417
+ value: 58.363
418
+ - type: recall_at_5
419
+ value: 65.083
420
+ - task:
421
+ type: Retrieval
422
+ dataset:
423
+ type: BeIR/cqadupstack
424
+ name: MTEB CQADupstackGisRetrieval
425
+ config: default
426
+ split: test
427
+ revision: None
428
+ metrics:
429
+ - type: map_at_1
430
+ value: 26.197
431
+ - type: map_at_10
432
+ value: 35.135
433
+ - type: map_at_100
434
+ value: 36.14
435
+ - type: map_at_1000
436
+ value: 36.216
437
+ - type: map_at_3
438
+ value: 32.358
439
+ - type: map_at_5
440
+ value: 33.814
441
+ - type: mrr_at_1
442
+ value: 28.475
443
+ - type: mrr_at_10
444
+ value: 37.096000000000004
445
+ - type: mrr_at_100
446
+ value: 38.006
447
+ - type: mrr_at_1000
448
+ value: 38.06
449
+ - type: mrr_at_3
450
+ value: 34.52
451
+ - type: mrr_at_5
452
+ value: 35.994
453
+ - type: ndcg_at_1
454
+ value: 28.475
455
+ - type: ndcg_at_10
456
+ value: 40.263
457
+ - type: ndcg_at_100
458
+ value: 45.327
459
+ - type: ndcg_at_1000
460
+ value: 47.225
461
+ - type: ndcg_at_3
462
+ value: 34.882000000000005
463
+ - type: ndcg_at_5
464
+ value: 37.347
465
+ - type: precision_at_1
466
+ value: 28.475
467
+ - type: precision_at_10
468
+ value: 6.249
469
+ - type: precision_at_100
470
+ value: 0.919
471
+ - type: precision_at_1000
472
+ value: 0.11199999999999999
473
+ - type: precision_at_3
474
+ value: 14.689
475
+ - type: precision_at_5
476
+ value: 10.237
477
+ - type: recall_at_1
478
+ value: 26.197
479
+ - type: recall_at_10
480
+ value: 54.17999999999999
481
+ - type: recall_at_100
482
+ value: 77.768
483
+ - type: recall_at_1000
484
+ value: 91.932
485
+ - type: recall_at_3
486
+ value: 39.804
487
+ - type: recall_at_5
488
+ value: 45.660000000000004
489
+ - task:
490
+ type: Retrieval
491
+ dataset:
492
+ type: BeIR/cqadupstack
493
+ name: MTEB CQADupstackMathematicaRetrieval
494
+ config: default
495
+ split: test
496
+ revision: None
497
+ metrics:
498
+ - type: map_at_1
499
+ value: 16.683
500
+ - type: map_at_10
501
+ value: 25.013999999999996
502
+ - type: map_at_100
503
+ value: 26.411
504
+ - type: map_at_1000
505
+ value: 26.531
506
+ - type: map_at_3
507
+ value: 22.357
508
+ - type: map_at_5
509
+ value: 23.982999999999997
510
+ - type: mrr_at_1
511
+ value: 20.896
512
+ - type: mrr_at_10
513
+ value: 29.758000000000003
514
+ - type: mrr_at_100
515
+ value: 30.895
516
+ - type: mrr_at_1000
517
+ value: 30.964999999999996
518
+ - type: mrr_at_3
519
+ value: 27.177
520
+ - type: mrr_at_5
521
+ value: 28.799999999999997
522
+ - type: ndcg_at_1
523
+ value: 20.896
524
+ - type: ndcg_at_10
525
+ value: 30.294999999999998
526
+ - type: ndcg_at_100
527
+ value: 36.68
528
+ - type: ndcg_at_1000
529
+ value: 39.519
530
+ - type: ndcg_at_3
531
+ value: 25.480999999999998
532
+ - type: ndcg_at_5
533
+ value: 28.027
534
+ - type: precision_at_1
535
+ value: 20.896
536
+ - type: precision_at_10
537
+ value: 5.56
538
+ - type: precision_at_100
539
+ value: 1.006
540
+ - type: precision_at_1000
541
+ value: 0.13899999999999998
542
+ - type: precision_at_3
543
+ value: 12.231
544
+ - type: precision_at_5
545
+ value: 9.104
546
+ - type: recall_at_1
547
+ value: 16.683
548
+ - type: recall_at_10
549
+ value: 41.807
550
+ - type: recall_at_100
551
+ value: 69.219
552
+ - type: recall_at_1000
553
+ value: 89.178
554
+ - type: recall_at_3
555
+ value: 28.772
556
+ - type: recall_at_5
557
+ value: 35.167
558
+ - task:
559
+ type: Retrieval
560
+ dataset:
561
+ type: BeIR/cqadupstack
562
+ name: MTEB CQADupstackPhysicsRetrieval
563
+ config: default
564
+ split: test
565
+ revision: None
566
+ metrics:
567
+ - type: map_at_1
568
+ value: 30.653000000000002
569
+ - type: map_at_10
570
+ value: 41.21
571
+ - type: map_at_100
572
+ value: 42.543
573
+ - type: map_at_1000
574
+ value: 42.657000000000004
575
+ - type: map_at_3
576
+ value: 38.094
577
+ - type: map_at_5
578
+ value: 39.966
579
+ - type: mrr_at_1
580
+ value: 37.824999999999996
581
+ - type: mrr_at_10
582
+ value: 47.087
583
+ - type: mrr_at_100
584
+ value: 47.959
585
+ - type: mrr_at_1000
586
+ value: 48.003
587
+ - type: mrr_at_3
588
+ value: 45.043
589
+ - type: mrr_at_5
590
+ value: 46.352
591
+ - type: ndcg_at_1
592
+ value: 37.824999999999996
593
+ - type: ndcg_at_10
594
+ value: 47.158
595
+ - type: ndcg_at_100
596
+ value: 52.65
597
+ - type: ndcg_at_1000
598
+ value: 54.644999999999996
599
+ - type: ndcg_at_3
600
+ value: 42.632999999999996
601
+ - type: ndcg_at_5
602
+ value: 44.994
603
+ - type: precision_at_1
604
+ value: 37.824999999999996
605
+ - type: precision_at_10
606
+ value: 8.498999999999999
607
+ - type: precision_at_100
608
+ value: 1.308
609
+ - type: precision_at_1000
610
+ value: 0.166
611
+ - type: precision_at_3
612
+ value: 20.308
613
+ - type: precision_at_5
614
+ value: 14.283000000000001
615
+ - type: recall_at_1
616
+ value: 30.653000000000002
617
+ - type: recall_at_10
618
+ value: 58.826
619
+ - type: recall_at_100
620
+ value: 81.94
621
+ - type: recall_at_1000
622
+ value: 94.71000000000001
623
+ - type: recall_at_3
624
+ value: 45.965
625
+ - type: recall_at_5
626
+ value: 52.294
627
+ - task:
628
+ type: Retrieval
629
+ dataset:
630
+ type: BeIR/cqadupstack
631
+ name: MTEB CQADupstackProgrammersRetrieval
632
+ config: default
633
+ split: test
634
+ revision: None
635
+ metrics:
636
+ - type: map_at_1
637
+ value: 26.71
638
+ - type: map_at_10
639
+ value: 36.001
640
+ - type: map_at_100
641
+ value: 37.416
642
+ - type: map_at_1000
643
+ value: 37.522
644
+ - type: map_at_3
645
+ value: 32.841
646
+ - type: map_at_5
647
+ value: 34.515
648
+ - type: mrr_at_1
649
+ value: 32.647999999999996
650
+ - type: mrr_at_10
651
+ value: 41.43
652
+ - type: mrr_at_100
653
+ value: 42.433
654
+ - type: mrr_at_1000
655
+ value: 42.482
656
+ - type: mrr_at_3
657
+ value: 39.117000000000004
658
+ - type: mrr_at_5
659
+ value: 40.35
660
+ - type: ndcg_at_1
661
+ value: 32.647999999999996
662
+ - type: ndcg_at_10
663
+ value: 41.629
664
+ - type: ndcg_at_100
665
+ value: 47.707
666
+ - type: ndcg_at_1000
667
+ value: 49.913000000000004
668
+ - type: ndcg_at_3
669
+ value: 36.598000000000006
670
+ - type: ndcg_at_5
671
+ value: 38.696000000000005
672
+ - type: precision_at_1
673
+ value: 32.647999999999996
674
+ - type: precision_at_10
675
+ value: 7.704999999999999
676
+ - type: precision_at_100
677
+ value: 1.242
678
+ - type: precision_at_1000
679
+ value: 0.16
680
+ - type: precision_at_3
681
+ value: 17.314
682
+ - type: precision_at_5
683
+ value: 12.374
684
+ - type: recall_at_1
685
+ value: 26.71
686
+ - type: recall_at_10
687
+ value: 52.898
688
+ - type: recall_at_100
689
+ value: 79.08
690
+ - type: recall_at_1000
691
+ value: 93.94
692
+ - type: recall_at_3
693
+ value: 38.731
694
+ - type: recall_at_5
695
+ value: 44.433
696
+ - task:
697
+ type: Retrieval
698
+ dataset:
699
+ type: BeIR/cqadupstack
700
+ name: MTEB CQADupstackRetrieval
701
+ config: default
702
+ split: test
703
+ revision: None
704
+ metrics:
705
+ - type: map_at_1
706
+ value: 26.510999999999996
707
+ - type: map_at_10
708
+ value: 35.755333333333326
709
+ - type: map_at_100
710
+ value: 36.97525
711
+ - type: map_at_1000
712
+ value: 37.08741666666667
713
+ - type: map_at_3
714
+ value: 32.921
715
+ - type: map_at_5
716
+ value: 34.45041666666667
717
+ - type: mrr_at_1
718
+ value: 31.578416666666666
719
+ - type: mrr_at_10
720
+ value: 40.06066666666667
721
+ - type: mrr_at_100
722
+ value: 40.93350000000001
723
+ - type: mrr_at_1000
724
+ value: 40.98716666666667
725
+ - type: mrr_at_3
726
+ value: 37.710499999999996
727
+ - type: mrr_at_5
728
+ value: 39.033249999999995
729
+ - type: ndcg_at_1
730
+ value: 31.578416666666666
731
+ - type: ndcg_at_10
732
+ value: 41.138666666666666
733
+ - type: ndcg_at_100
734
+ value: 46.37291666666666
735
+ - type: ndcg_at_1000
736
+ value: 48.587500000000006
737
+ - type: ndcg_at_3
738
+ value: 36.397083333333335
739
+ - type: ndcg_at_5
740
+ value: 38.539
741
+ - type: precision_at_1
742
+ value: 31.578416666666666
743
+ - type: precision_at_10
744
+ value: 7.221583333333332
745
+ - type: precision_at_100
746
+ value: 1.1581666666666668
747
+ - type: precision_at_1000
748
+ value: 0.15416666666666667
749
+ - type: precision_at_3
750
+ value: 16.758
751
+ - type: precision_at_5
752
+ value: 11.830916666666665
753
+ - type: recall_at_1
754
+ value: 26.510999999999996
755
+ - type: recall_at_10
756
+ value: 52.7825
757
+ - type: recall_at_100
758
+ value: 75.79675
759
+ - type: recall_at_1000
760
+ value: 91.10483333333335
761
+ - type: recall_at_3
762
+ value: 39.48233333333334
763
+ - type: recall_at_5
764
+ value: 45.07116666666667
765
+ - task:
766
+ type: Retrieval
767
+ dataset:
768
+ type: BeIR/cqadupstack
769
+ name: MTEB CQADupstackStatsRetrieval
770
+ config: default
771
+ split: test
772
+ revision: None
773
+ metrics:
774
+ - type: map_at_1
775
+ value: 24.564
776
+ - type: map_at_10
777
+ value: 31.235000000000003
778
+ - type: map_at_100
779
+ value: 32.124
780
+ - type: map_at_1000
781
+ value: 32.216
782
+ - type: map_at_3
783
+ value: 29.330000000000002
784
+ - type: map_at_5
785
+ value: 30.379
786
+ - type: mrr_at_1
787
+ value: 27.761000000000003
788
+ - type: mrr_at_10
789
+ value: 34.093
790
+ - type: mrr_at_100
791
+ value: 34.885
792
+ - type: mrr_at_1000
793
+ value: 34.957
794
+ - type: mrr_at_3
795
+ value: 32.388
796
+ - type: mrr_at_5
797
+ value: 33.269
798
+ - type: ndcg_at_1
799
+ value: 27.761000000000003
800
+ - type: ndcg_at_10
801
+ value: 35.146
802
+ - type: ndcg_at_100
803
+ value: 39.597
804
+ - type: ndcg_at_1000
805
+ value: 42.163000000000004
806
+ - type: ndcg_at_3
807
+ value: 31.674000000000003
808
+ - type: ndcg_at_5
809
+ value: 33.224
810
+ - type: precision_at_1
811
+ value: 27.761000000000003
812
+ - type: precision_at_10
813
+ value: 5.383
814
+ - type: precision_at_100
815
+ value: 0.836
816
+ - type: precision_at_1000
817
+ value: 0.11199999999999999
818
+ - type: precision_at_3
819
+ value: 13.599
820
+ - type: precision_at_5
821
+ value: 9.202
822
+ - type: recall_at_1
823
+ value: 24.564
824
+ - type: recall_at_10
825
+ value: 44.36
826
+ - type: recall_at_100
827
+ value: 64.408
828
+ - type: recall_at_1000
829
+ value: 83.892
830
+ - type: recall_at_3
831
+ value: 34.653
832
+ - type: recall_at_5
833
+ value: 38.589
834
+ - task:
835
+ type: Retrieval
836
+ dataset:
837
+ type: BeIR/cqadupstack
838
+ name: MTEB CQADupstackTexRetrieval
839
+ config: default
840
+ split: test
841
+ revision: None
842
+ metrics:
843
+ - type: map_at_1
844
+ value: 17.01
845
+ - type: map_at_10
846
+ value: 24.485
847
+ - type: map_at_100
848
+ value: 25.573
849
+ - type: map_at_1000
850
+ value: 25.703
851
+ - type: map_at_3
852
+ value: 21.953
853
+ - type: map_at_5
854
+ value: 23.294999999999998
855
+ - type: mrr_at_1
856
+ value: 20.544
857
+ - type: mrr_at_10
858
+ value: 28.238000000000003
859
+ - type: mrr_at_100
860
+ value: 29.142000000000003
861
+ - type: mrr_at_1000
862
+ value: 29.219
863
+ - type: mrr_at_3
864
+ value: 25.802999999999997
865
+ - type: mrr_at_5
866
+ value: 27.105
867
+ - type: ndcg_at_1
868
+ value: 20.544
869
+ - type: ndcg_at_10
870
+ value: 29.387999999999998
871
+ - type: ndcg_at_100
872
+ value: 34.603
873
+ - type: ndcg_at_1000
874
+ value: 37.564
875
+ - type: ndcg_at_3
876
+ value: 24.731
877
+ - type: ndcg_at_5
878
+ value: 26.773000000000003
879
+ - type: precision_at_1
880
+ value: 20.544
881
+ - type: precision_at_10
882
+ value: 5.509
883
+ - type: precision_at_100
884
+ value: 0.9450000000000001
885
+ - type: precision_at_1000
886
+ value: 0.13799999999999998
887
+ - type: precision_at_3
888
+ value: 11.757
889
+ - type: precision_at_5
890
+ value: 8.596
891
+ - type: recall_at_1
892
+ value: 17.01
893
+ - type: recall_at_10
894
+ value: 40.392
895
+ - type: recall_at_100
896
+ value: 64.043
897
+ - type: recall_at_1000
898
+ value: 85.031
899
+ - type: recall_at_3
900
+ value: 27.293
901
+ - type: recall_at_5
902
+ value: 32.586999999999996
903
+ - task:
904
+ type: Retrieval
905
+ dataset:
906
+ type: BeIR/cqadupstack
907
+ name: MTEB CQADupstackUnixRetrieval
908
+ config: default
909
+ split: test
910
+ revision: None
911
+ metrics:
912
+ - type: map_at_1
913
+ value: 27.155
914
+ - type: map_at_10
915
+ value: 35.92
916
+ - type: map_at_100
917
+ value: 37.034
918
+ - type: map_at_1000
919
+ value: 37.139
920
+ - type: map_at_3
921
+ value: 33.263999999999996
922
+ - type: map_at_5
923
+ value: 34.61
924
+ - type: mrr_at_1
925
+ value: 32.183
926
+ - type: mrr_at_10
927
+ value: 40.099000000000004
928
+ - type: mrr_at_100
929
+ value: 41.001
930
+ - type: mrr_at_1000
931
+ value: 41.059
932
+ - type: mrr_at_3
933
+ value: 37.889
934
+ - type: mrr_at_5
935
+ value: 39.007999999999996
936
+ - type: ndcg_at_1
937
+ value: 32.183
938
+ - type: ndcg_at_10
939
+ value: 41.127
940
+ - type: ndcg_at_100
941
+ value: 46.464
942
+ - type: ndcg_at_1000
943
+ value: 48.67
944
+ - type: ndcg_at_3
945
+ value: 36.396
946
+ - type: ndcg_at_5
947
+ value: 38.313
948
+ - type: precision_at_1
949
+ value: 32.183
950
+ - type: precision_at_10
951
+ value: 6.847
952
+ - type: precision_at_100
953
+ value: 1.0739999999999998
954
+ - type: precision_at_1000
955
+ value: 0.13699999999999998
956
+ - type: precision_at_3
957
+ value: 16.356
958
+ - type: precision_at_5
959
+ value: 11.362
960
+ - type: recall_at_1
961
+ value: 27.155
962
+ - type: recall_at_10
963
+ value: 52.922000000000004
964
+ - type: recall_at_100
965
+ value: 76.39
966
+ - type: recall_at_1000
967
+ value: 91.553
968
+ - type: recall_at_3
969
+ value: 39.745999999999995
970
+ - type: recall_at_5
971
+ value: 44.637
972
+ - task:
973
+ type: Retrieval
974
+ dataset:
975
+ type: BeIR/cqadupstack
976
+ name: MTEB CQADupstackWebmastersRetrieval
977
+ config: default
978
+ split: test
979
+ revision: None
980
+ metrics:
981
+ - type: map_at_1
982
+ value: 25.523
983
+ - type: map_at_10
984
+ value: 34.268
985
+ - type: map_at_100
986
+ value: 35.835
987
+ - type: map_at_1000
988
+ value: 36.046
989
+ - type: map_at_3
990
+ value: 31.662000000000003
991
+ - type: map_at_5
992
+ value: 32.71
993
+ - type: mrr_at_1
994
+ value: 31.028
995
+ - type: mrr_at_10
996
+ value: 38.924
997
+ - type: mrr_at_100
998
+ value: 39.95
999
+ - type: mrr_at_1000
1000
+ value: 40.003
1001
+ - type: mrr_at_3
1002
+ value: 36.594
1003
+ - type: mrr_at_5
1004
+ value: 37.701
1005
+ - type: ndcg_at_1
1006
+ value: 31.028
1007
+ - type: ndcg_at_10
1008
+ value: 39.848
1009
+ - type: ndcg_at_100
1010
+ value: 45.721000000000004
1011
+ - type: ndcg_at_1000
1012
+ value: 48.424
1013
+ - type: ndcg_at_3
1014
+ value: 35.329
1015
+ - type: ndcg_at_5
1016
+ value: 36.779
1017
+ - type: precision_at_1
1018
+ value: 31.028
1019
+ - type: precision_at_10
1020
+ value: 7.51
1021
+ - type: precision_at_100
1022
+ value: 1.478
1023
+ - type: precision_at_1000
1024
+ value: 0.24
1025
+ - type: precision_at_3
1026
+ value: 16.337
1027
+ - type: precision_at_5
1028
+ value: 11.383000000000001
1029
+ - type: recall_at_1
1030
+ value: 25.523
1031
+ - type: recall_at_10
1032
+ value: 50.735
1033
+ - type: recall_at_100
1034
+ value: 76.593
1035
+ - type: recall_at_1000
1036
+ value: 93.771
1037
+ - type: recall_at_3
1038
+ value: 37.574000000000005
1039
+ - type: recall_at_5
1040
+ value: 41.602
1041
+ - task:
1042
+ type: Retrieval
1043
+ dataset:
1044
+ type: BeIR/cqadupstack
1045
+ name: MTEB CQADupstackWordpressRetrieval
1046
+ config: default
1047
+ split: test
1048
+ revision: None
1049
+ metrics:
1050
+ - type: map_at_1
1051
+ value: 20.746000000000002
1052
+ - type: map_at_10
1053
+ value: 28.557
1054
+ - type: map_at_100
1055
+ value: 29.575000000000003
1056
+ - type: map_at_1000
1057
+ value: 29.659000000000002
1058
+ - type: map_at_3
1059
+ value: 25.753999999999998
1060
+ - type: map_at_5
1061
+ value: 27.254
1062
+ - type: mrr_at_1
1063
+ value: 22.736
1064
+ - type: mrr_at_10
1065
+ value: 30.769000000000002
1066
+ - type: mrr_at_100
1067
+ value: 31.655
1068
+ - type: mrr_at_1000
1069
+ value: 31.717000000000002
1070
+ - type: mrr_at_3
1071
+ value: 28.065
1072
+ - type: mrr_at_5
1073
+ value: 29.543999999999997
1074
+ - type: ndcg_at_1
1075
+ value: 22.736
1076
+ - type: ndcg_at_10
1077
+ value: 33.545
1078
+ - type: ndcg_at_100
1079
+ value: 38.743
1080
+ - type: ndcg_at_1000
1081
+ value: 41.002
1082
+ - type: ndcg_at_3
1083
+ value: 28.021
1084
+ - type: ndcg_at_5
1085
+ value: 30.586999999999996
1086
+ - type: precision_at_1
1087
+ value: 22.736
1088
+ - type: precision_at_10
1089
+ value: 5.416
1090
+ - type: precision_at_100
1091
+ value: 0.8710000000000001
1092
+ - type: precision_at_1000
1093
+ value: 0.116
1094
+ - type: precision_at_3
1095
+ value: 11.953
1096
+ - type: precision_at_5
1097
+ value: 8.651
1098
+ - type: recall_at_1
1099
+ value: 20.746000000000002
1100
+ - type: recall_at_10
1101
+ value: 46.87
1102
+ - type: recall_at_100
1103
+ value: 71.25200000000001
1104
+ - type: recall_at_1000
1105
+ value: 88.26
1106
+ - type: recall_at_3
1107
+ value: 32.029999999999994
1108
+ - type: recall_at_5
1109
+ value: 38.21
1110
+ - task:
1111
+ type: Retrieval
1112
+ dataset:
1113
+ type: climate-fever
1114
+ name: MTEB ClimateFEVER
1115
+ config: default
1116
+ split: test
1117
+ revision: None
1118
+ metrics:
1119
+ - type: map_at_1
1120
+ value: 12.105
1121
+ - type: map_at_10
1122
+ value: 20.577
1123
+ - type: map_at_100
1124
+ value: 22.686999999999998
1125
+ - type: map_at_1000
1126
+ value: 22.889
1127
+ - type: map_at_3
1128
+ value: 17.174
1129
+ - type: map_at_5
1130
+ value: 18.807
1131
+ - type: mrr_at_1
1132
+ value: 27.101
1133
+ - type: mrr_at_10
1134
+ value: 38.475
1135
+ - type: mrr_at_100
1136
+ value: 39.491
1137
+ - type: mrr_at_1000
1138
+ value: 39.525
1139
+ - type: mrr_at_3
1140
+ value: 34.886
1141
+ - type: mrr_at_5
1142
+ value: 36.922
1143
+ - type: ndcg_at_1
1144
+ value: 27.101
1145
+ - type: ndcg_at_10
1146
+ value: 29.002
1147
+ - type: ndcg_at_100
1148
+ value: 37.218
1149
+ - type: ndcg_at_1000
1150
+ value: 40.644000000000005
1151
+ - type: ndcg_at_3
1152
+ value: 23.464
1153
+ - type: ndcg_at_5
1154
+ value: 25.262
1155
+ - type: precision_at_1
1156
+ value: 27.101
1157
+ - type: precision_at_10
1158
+ value: 9.179
1159
+ - type: precision_at_100
1160
+ value: 1.806
1161
+ - type: precision_at_1000
1162
+ value: 0.244
1163
+ - type: precision_at_3
1164
+ value: 17.394000000000002
1165
+ - type: precision_at_5
1166
+ value: 13.342
1167
+ - type: recall_at_1
1168
+ value: 12.105
1169
+ - type: recall_at_10
1170
+ value: 35.143
1171
+ - type: recall_at_100
1172
+ value: 63.44499999999999
1173
+ - type: recall_at_1000
1174
+ value: 82.49499999999999
1175
+ - type: recall_at_3
1176
+ value: 21.489
1177
+ - type: recall_at_5
1178
+ value: 26.82
1179
+ - task:
1180
+ type: Retrieval
1181
+ dataset:
1182
+ type: dbpedia-entity
1183
+ name: MTEB DBPedia
1184
+ config: default
1185
+ split: test
1186
+ revision: None
1187
+ metrics:
1188
+ - type: map_at_1
1189
+ value: 8.769
1190
+ - type: map_at_10
1191
+ value: 18.619
1192
+ - type: map_at_100
1193
+ value: 26.3
1194
+ - type: map_at_1000
1195
+ value: 28.063
1196
+ - type: map_at_3
1197
+ value: 13.746
1198
+ - type: map_at_5
1199
+ value: 16.035
1200
+ - type: mrr_at_1
1201
+ value: 65.25
1202
+ - type: mrr_at_10
1203
+ value: 73.678
1204
+ - type: mrr_at_100
1205
+ value: 73.993
1206
+ - type: mrr_at_1000
1207
+ value: 74.003
1208
+ - type: mrr_at_3
1209
+ value: 72.042
1210
+ - type: mrr_at_5
1211
+ value: 72.992
1212
+ - type: ndcg_at_1
1213
+ value: 53.625
1214
+ - type: ndcg_at_10
1215
+ value: 39.638
1216
+ - type: ndcg_at_100
1217
+ value: 44.601
1218
+ - type: ndcg_at_1000
1219
+ value: 52.80200000000001
1220
+ - type: ndcg_at_3
1221
+ value: 44.727
1222
+ - type: ndcg_at_5
1223
+ value: 42.199
1224
+ - type: precision_at_1
1225
+ value: 65.25
1226
+ - type: precision_at_10
1227
+ value: 31.025000000000002
1228
+ - type: precision_at_100
1229
+ value: 10.174999999999999
1230
+ - type: precision_at_1000
1231
+ value: 2.0740000000000003
1232
+ - type: precision_at_3
1233
+ value: 48.083
1234
+ - type: precision_at_5
1235
+ value: 40.6
1236
+ - type: recall_at_1
1237
+ value: 8.769
1238
+ - type: recall_at_10
1239
+ value: 23.910999999999998
1240
+ - type: recall_at_100
1241
+ value: 51.202999999999996
1242
+ - type: recall_at_1000
1243
+ value: 77.031
1244
+ - type: recall_at_3
1245
+ value: 15.387999999999998
1246
+ - type: recall_at_5
1247
+ value: 18.919
1248
+ - task:
1249
+ type: Classification
1250
+ dataset:
1251
+ type: mteb/emotion
1252
+ name: MTEB EmotionClassification
1253
+ config: default
1254
+ split: test
1255
+ revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37
1256
+ metrics:
1257
+ - type: accuracy
1258
+ value: 54.47
1259
+ - type: f1
1260
+ value: 48.21839043361556
1261
+ - task:
1262
+ type: Retrieval
1263
+ dataset:
1264
+ type: fever
1265
+ name: MTEB FEVER
1266
+ config: default
1267
+ split: test
1268
+ revision: None
1269
+ metrics:
1270
+ - type: map_at_1
1271
+ value: 63.564
1272
+ - type: map_at_10
1273
+ value: 74.236
1274
+ - type: map_at_100
1275
+ value: 74.53699999999999
1276
+ - type: map_at_1000
1277
+ value: 74.557
1278
+ - type: map_at_3
1279
+ value: 72.556
1280
+ - type: map_at_5
1281
+ value: 73.656
1282
+ - type: mrr_at_1
1283
+ value: 68.497
1284
+ - type: mrr_at_10
1285
+ value: 78.373
1286
+ - type: mrr_at_100
1287
+ value: 78.54299999999999
1288
+ - type: mrr_at_1000
1289
+ value: 78.549
1290
+ - type: mrr_at_3
1291
+ value: 77.03
1292
+ - type: mrr_at_5
1293
+ value: 77.938
1294
+ - type: ndcg_at_1
1295
+ value: 68.497
1296
+ - type: ndcg_at_10
1297
+ value: 79.12599999999999
1298
+ - type: ndcg_at_100
1299
+ value: 80.319
1300
+ - type: ndcg_at_1000
1301
+ value: 80.71199999999999
1302
+ - type: ndcg_at_3
1303
+ value: 76.209
1304
+ - type: ndcg_at_5
1305
+ value: 77.90700000000001
1306
+ - type: precision_at_1
1307
+ value: 68.497
1308
+ - type: precision_at_10
1309
+ value: 9.958
1310
+ - type: precision_at_100
1311
+ value: 1.077
1312
+ - type: precision_at_1000
1313
+ value: 0.11299999999999999
1314
+ - type: precision_at_3
1315
+ value: 29.908
1316
+ - type: precision_at_5
1317
+ value: 18.971
1318
+ - type: recall_at_1
1319
+ value: 63.564
1320
+ - type: recall_at_10
1321
+ value: 90.05199999999999
1322
+ - type: recall_at_100
1323
+ value: 95.028
1324
+ - type: recall_at_1000
1325
+ value: 97.667
1326
+ - type: recall_at_3
1327
+ value: 82.17999999999999
1328
+ - type: recall_at_5
1329
+ value: 86.388
1330
+ - task:
1331
+ type: Retrieval
1332
+ dataset:
1333
+ type: fiqa
1334
+ name: MTEB FiQA2018
1335
+ config: default
1336
+ split: test
1337
+ revision: None
1338
+ metrics:
1339
+ - type: map_at_1
1340
+ value: 19.042
1341
+ - type: map_at_10
1342
+ value: 30.764999999999997
1343
+ - type: map_at_100
1344
+ value: 32.678000000000004
1345
+ - type: map_at_1000
1346
+ value: 32.881
1347
+ - type: map_at_3
1348
+ value: 26.525
1349
+ - type: map_at_5
1350
+ value: 28.932000000000002
1351
+ - type: mrr_at_1
1352
+ value: 37.653999999999996
1353
+ - type: mrr_at_10
1354
+ value: 46.597
1355
+ - type: mrr_at_100
1356
+ value: 47.413
1357
+ - type: mrr_at_1000
1358
+ value: 47.453
1359
+ - type: mrr_at_3
1360
+ value: 43.775999999999996
1361
+ - type: mrr_at_5
1362
+ value: 45.489000000000004
1363
+ - type: ndcg_at_1
1364
+ value: 37.653999999999996
1365
+ - type: ndcg_at_10
1366
+ value: 38.615
1367
+ - type: ndcg_at_100
1368
+ value: 45.513999999999996
1369
+ - type: ndcg_at_1000
1370
+ value: 48.815999999999995
1371
+ - type: ndcg_at_3
1372
+ value: 34.427
1373
+ - type: ndcg_at_5
1374
+ value: 35.954
1375
+ - type: precision_at_1
1376
+ value: 37.653999999999996
1377
+ - type: precision_at_10
1378
+ value: 10.864
1379
+ - type: precision_at_100
1380
+ value: 1.7850000000000001
1381
+ - type: precision_at_1000
1382
+ value: 0.23800000000000002
1383
+ - type: precision_at_3
1384
+ value: 22.788
1385
+ - type: precision_at_5
1386
+ value: 17.346
1387
+ - type: recall_at_1
1388
+ value: 19.042
1389
+ - type: recall_at_10
1390
+ value: 45.707
1391
+ - type: recall_at_100
1392
+ value: 71.152
1393
+ - type: recall_at_1000
1394
+ value: 90.7
1395
+ - type: recall_at_3
1396
+ value: 30.814000000000004
1397
+ - type: recall_at_5
1398
+ value: 37.478
1399
+ - task:
1400
+ type: Retrieval
1401
+ dataset:
1402
+ type: hotpotqa
1403
+ name: MTEB HotpotQA
1404
+ config: default
1405
+ split: test
1406
+ revision: None
1407
+ metrics:
1408
+ - type: map_at_1
1409
+ value: 38.001000000000005
1410
+ - type: map_at_10
1411
+ value: 59.611000000000004
1412
+ - type: map_at_100
1413
+ value: 60.582
1414
+ - type: map_at_1000
1415
+ value: 60.646
1416
+ - type: map_at_3
1417
+ value: 56.031
1418
+ - type: map_at_5
1419
+ value: 58.243
1420
+ - type: mrr_at_1
1421
+ value: 76.003
1422
+ - type: mrr_at_10
1423
+ value: 82.15400000000001
1424
+ - type: mrr_at_100
1425
+ value: 82.377
1426
+ - type: mrr_at_1000
1427
+ value: 82.383
1428
+ - type: mrr_at_3
1429
+ value: 81.092
1430
+ - type: mrr_at_5
1431
+ value: 81.742
1432
+ - type: ndcg_at_1
1433
+ value: 76.003
1434
+ - type: ndcg_at_10
1435
+ value: 68.216
1436
+ - type: ndcg_at_100
1437
+ value: 71.601
1438
+ - type: ndcg_at_1000
1439
+ value: 72.821
1440
+ - type: ndcg_at_3
1441
+ value: 63.109
1442
+ - type: ndcg_at_5
1443
+ value: 65.902
1444
+ - type: precision_at_1
1445
+ value: 76.003
1446
+ - type: precision_at_10
1447
+ value: 14.379
1448
+ - type: precision_at_100
1449
+ value: 1.702
1450
+ - type: precision_at_1000
1451
+ value: 0.186
1452
+ - type: precision_at_3
1453
+ value: 40.396
1454
+ - type: precision_at_5
1455
+ value: 26.442
1456
+ - type: recall_at_1
1457
+ value: 38.001000000000005
1458
+ - type: recall_at_10
1459
+ value: 71.897
1460
+ - type: recall_at_100
1461
+ value: 85.105
1462
+ - type: recall_at_1000
1463
+ value: 93.133
1464
+ - type: recall_at_3
1465
+ value: 60.594
1466
+ - type: recall_at_5
1467
+ value: 66.104
1468
+ - task:
1469
+ type: Classification
1470
+ dataset:
1471
+ type: mteb/imdb
1472
+ name: MTEB ImdbClassification
1473
+ config: default
1474
+ split: test
1475
+ revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7
1476
+ metrics:
1477
+ - type: accuracy
1478
+ value: 91.31280000000001
1479
+ - type: ap
1480
+ value: 87.53723467501632
1481
+ - type: f1
1482
+ value: 91.30282906596291
1483
+ - task:
1484
+ type: Retrieval
1485
+ dataset:
1486
+ type: msmarco
1487
+ name: MTEB MSMARCO
1488
+ config: default
1489
+ split: dev
1490
+ revision: None
1491
+ metrics:
1492
+ - type: map_at_1
1493
+ value: 21.917
1494
+ - type: map_at_10
1495
+ value: 34.117999999999995
1496
+ - type: map_at_100
1497
+ value: 35.283
1498
+ - type: map_at_1000
1499
+ value: 35.333999999999996
1500
+ - type: map_at_3
1501
+ value: 30.330000000000002
1502
+ - type: map_at_5
1503
+ value: 32.461
1504
+ - type: mrr_at_1
1505
+ value: 22.579
1506
+ - type: mrr_at_10
1507
+ value: 34.794000000000004
1508
+ - type: mrr_at_100
1509
+ value: 35.893
1510
+ - type: mrr_at_1000
1511
+ value: 35.937000000000005
1512
+ - type: mrr_at_3
1513
+ value: 31.091
1514
+ - type: mrr_at_5
1515
+ value: 33.173
1516
+ - type: ndcg_at_1
1517
+ value: 22.579
1518
+ - type: ndcg_at_10
1519
+ value: 40.951
1520
+ - type: ndcg_at_100
1521
+ value: 46.558
1522
+ - type: ndcg_at_1000
1523
+ value: 47.803000000000004
1524
+ - type: ndcg_at_3
1525
+ value: 33.262
1526
+ - type: ndcg_at_5
1527
+ value: 37.036
1528
+ - type: precision_at_1
1529
+ value: 22.579
1530
+ - type: precision_at_10
1531
+ value: 6.463000000000001
1532
+ - type: precision_at_100
1533
+ value: 0.928
1534
+ - type: precision_at_1000
1535
+ value: 0.104
1536
+ - type: precision_at_3
1537
+ value: 14.174000000000001
1538
+ - type: precision_at_5
1539
+ value: 10.421
1540
+ - type: recall_at_1
1541
+ value: 21.917
1542
+ - type: recall_at_10
1543
+ value: 61.885
1544
+ - type: recall_at_100
1545
+ value: 87.847
1546
+ - type: recall_at_1000
1547
+ value: 97.322
1548
+ - type: recall_at_3
1549
+ value: 41.010000000000005
1550
+ - type: recall_at_5
1551
+ value: 50.031000000000006
1552
+ - task:
1553
+ type: Classification
1554
+ dataset:
1555
+ type: mteb/mtop_domain
1556
+ name: MTEB MTOPDomainClassification (en)
1557
+ config: en
1558
+ split: test
1559
+ revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
1560
+ metrics:
1561
+ - type: accuracy
1562
+ value: 93.49521203830369
1563
+ - type: f1
1564
+ value: 93.30882341740241
1565
+ - task:
1566
+ type: Classification
1567
+ dataset:
1568
+ type: mteb/mtop_intent
1569
+ name: MTEB MTOPIntentClassification (en)
1570
+ config: en
1571
+ split: test
1572
+ revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
1573
+ metrics:
1574
+ - type: accuracy
1575
+ value: 71.0579115367077
1576
+ - type: f1
1577
+ value: 51.2368258319339
1578
+ - task:
1579
+ type: Classification
1580
+ dataset:
1581
+ type: mteb/amazon_massive_intent
1582
+ name: MTEB MassiveIntentClassification (en)
1583
+ config: en
1584
+ split: test
1585
+ revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1586
+ metrics:
1587
+ - type: accuracy
1588
+ value: 73.88029589778077
1589
+ - type: f1
1590
+ value: 72.34422048584663
1591
+ - task:
1592
+ type: Classification
1593
+ dataset:
1594
+ type: mteb/amazon_massive_scenario
1595
+ name: MTEB MassiveScenarioClassification (en)
1596
+ config: en
1597
+ split: test
1598
+ revision: 7d571f92784cd94a019292a1f45445077d0ef634
1599
+ metrics:
1600
+ - type: accuracy
1601
+ value: 78.2817753866846
1602
+ - type: f1
1603
+ value: 77.87746050004304
1604
+ - task:
1605
+ type: Clustering
1606
+ dataset:
1607
+ type: mteb/medrxiv-clustering-p2p
1608
+ name: MTEB MedrxivClusteringP2P
1609
+ config: default
1610
+ split: test
1611
+ revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73
1612
+ metrics:
1613
+ - type: v_measure
1614
+ value: 33.247341454119216
1615
+ - task:
1616
+ type: Clustering
1617
+ dataset:
1618
+ type: mteb/medrxiv-clustering-s2s
1619
+ name: MTEB MedrxivClusteringS2S
1620
+ config: default
1621
+ split: test
1622
+ revision: 35191c8c0dca72d8ff3efcd72aa802307d469663
1623
+ metrics:
1624
+ - type: v_measure
1625
+ value: 31.9647477166234
1626
+ - task:
1627
+ type: Reranking
1628
+ dataset:
1629
+ type: mteb/mind_small
1630
+ name: MTEB MindSmallReranking
1631
+ config: default
1632
+ split: test
1633
+ revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69
1634
+ metrics:
1635
+ - type: map
1636
+ value: 31.90698374676892
1637
+ - type: mrr
1638
+ value: 33.07523683771251
1639
+ - task:
1640
+ type: Retrieval
1641
+ dataset:
1642
+ type: nfcorpus
1643
+ name: MTEB NFCorpus
1644
+ config: default
1645
+ split: test
1646
+ revision: None
1647
+ metrics:
1648
+ - type: map_at_1
1649
+ value: 6.717
1650
+ - type: map_at_10
1651
+ value: 14.566
1652
+ - type: map_at_100
1653
+ value: 18.465999999999998
1654
+ - type: map_at_1000
1655
+ value: 20.033
1656
+ - type: map_at_3
1657
+ value: 10.863
1658
+ - type: map_at_5
1659
+ value: 12.589
1660
+ - type: mrr_at_1
1661
+ value: 49.845
1662
+ - type: mrr_at_10
1663
+ value: 58.385
1664
+ - type: mrr_at_100
1665
+ value: 58.989999999999995
1666
+ - type: mrr_at_1000
1667
+ value: 59.028999999999996
1668
+ - type: mrr_at_3
1669
+ value: 56.76
1670
+ - type: mrr_at_5
1671
+ value: 57.766
1672
+ - type: ndcg_at_1
1673
+ value: 47.678
1674
+ - type: ndcg_at_10
1675
+ value: 37.511
1676
+ - type: ndcg_at_100
1677
+ value: 34.537
1678
+ - type: ndcg_at_1000
1679
+ value: 43.612
1680
+ - type: ndcg_at_3
1681
+ value: 43.713
1682
+ - type: ndcg_at_5
1683
+ value: 41.303
1684
+ - type: precision_at_1
1685
+ value: 49.845
1686
+ - type: precision_at_10
1687
+ value: 27.307
1688
+ - type: precision_at_100
1689
+ value: 8.746
1690
+ - type: precision_at_1000
1691
+ value: 2.182
1692
+ - type: precision_at_3
1693
+ value: 40.764
1694
+ - type: precision_at_5
1695
+ value: 35.232
1696
+ - type: recall_at_1
1697
+ value: 6.717
1698
+ - type: recall_at_10
1699
+ value: 18.107
1700
+ - type: recall_at_100
1701
+ value: 33.759
1702
+ - type: recall_at_1000
1703
+ value: 67.31
1704
+ - type: recall_at_3
1705
+ value: 11.68
1706
+ - type: recall_at_5
1707
+ value: 14.557999999999998
1708
+ - task:
1709
+ type: Retrieval
1710
+ dataset:
1711
+ type: nq
1712
+ name: MTEB NQ
1713
+ config: default
1714
+ split: test
1715
+ revision: None
1716
+ metrics:
1717
+ - type: map_at_1
1718
+ value: 27.633999999999997
1719
+ - type: map_at_10
1720
+ value: 42.400999999999996
1721
+ - type: map_at_100
1722
+ value: 43.561
1723
+ - type: map_at_1000
1724
+ value: 43.592
1725
+ - type: map_at_3
1726
+ value: 37.865
1727
+ - type: map_at_5
1728
+ value: 40.650999999999996
1729
+ - type: mrr_at_1
1730
+ value: 31.286
1731
+ - type: mrr_at_10
1732
+ value: 44.996
1733
+ - type: mrr_at_100
1734
+ value: 45.889
1735
+ - type: mrr_at_1000
1736
+ value: 45.911
1737
+ - type: mrr_at_3
1738
+ value: 41.126000000000005
1739
+ - type: mrr_at_5
1740
+ value: 43.536
1741
+ - type: ndcg_at_1
1742
+ value: 31.257
1743
+ - type: ndcg_at_10
1744
+ value: 50.197
1745
+ - type: ndcg_at_100
1746
+ value: 55.062
1747
+ - type: ndcg_at_1000
1748
+ value: 55.81700000000001
1749
+ - type: ndcg_at_3
1750
+ value: 41.650999999999996
1751
+ - type: ndcg_at_5
1752
+ value: 46.324
1753
+ - type: precision_at_1
1754
+ value: 31.257
1755
+ - type: precision_at_10
1756
+ value: 8.508000000000001
1757
+ - type: precision_at_100
1758
+ value: 1.121
1759
+ - type: precision_at_1000
1760
+ value: 0.11900000000000001
1761
+ - type: precision_at_3
1762
+ value: 19.1
1763
+ - type: precision_at_5
1764
+ value: 14.16
1765
+ - type: recall_at_1
1766
+ value: 27.633999999999997
1767
+ - type: recall_at_10
1768
+ value: 71.40100000000001
1769
+ - type: recall_at_100
1770
+ value: 92.463
1771
+ - type: recall_at_1000
1772
+ value: 98.13199999999999
1773
+ - type: recall_at_3
1774
+ value: 49.382
1775
+ - type: recall_at_5
1776
+ value: 60.144
1777
+ - task:
1778
+ type: Retrieval
1779
+ dataset:
1780
+ type: quora
1781
+ name: MTEB QuoraRetrieval
1782
+ config: default
1783
+ split: test
1784
+ revision: None
1785
+ metrics:
1786
+ - type: map_at_1
1787
+ value: 71.17099999999999
1788
+ - type: map_at_10
1789
+ value: 85.036
1790
+ - type: map_at_100
1791
+ value: 85.67099999999999
1792
+ - type: map_at_1000
1793
+ value: 85.68599999999999
1794
+ - type: map_at_3
1795
+ value: 82.086
1796
+ - type: map_at_5
1797
+ value: 83.956
1798
+ - type: mrr_at_1
1799
+ value: 82.04
1800
+ - type: mrr_at_10
1801
+ value: 88.018
1802
+ - type: mrr_at_100
1803
+ value: 88.114
1804
+ - type: mrr_at_1000
1805
+ value: 88.115
1806
+ - type: mrr_at_3
1807
+ value: 87.047
1808
+ - type: mrr_at_5
1809
+ value: 87.73100000000001
1810
+ - type: ndcg_at_1
1811
+ value: 82.03
1812
+ - type: ndcg_at_10
1813
+ value: 88.717
1814
+ - type: ndcg_at_100
1815
+ value: 89.904
1816
+ - type: ndcg_at_1000
1817
+ value: 89.991
1818
+ - type: ndcg_at_3
1819
+ value: 85.89099999999999
1820
+ - type: ndcg_at_5
1821
+ value: 87.485
1822
+ - type: precision_at_1
1823
+ value: 82.03
1824
+ - type: precision_at_10
1825
+ value: 13.444999999999999
1826
+ - type: precision_at_100
1827
+ value: 1.533
1828
+ - type: precision_at_1000
1829
+ value: 0.157
1830
+ - type: precision_at_3
1831
+ value: 37.537
1832
+ - type: precision_at_5
1833
+ value: 24.692
1834
+ - type: recall_at_1
1835
+ value: 71.17099999999999
1836
+ - type: recall_at_10
1837
+ value: 95.634
1838
+ - type: recall_at_100
1839
+ value: 99.614
1840
+ - type: recall_at_1000
1841
+ value: 99.99
1842
+ - type: recall_at_3
1843
+ value: 87.48
1844
+ - type: recall_at_5
1845
+ value: 91.996
1846
+ - task:
1847
+ type: Clustering
1848
+ dataset:
1849
+ type: mteb/reddit-clustering
1850
+ name: MTEB RedditClustering
1851
+ config: default
1852
+ split: test
1853
+ revision: 24640382cdbf8abc73003fb0fa6d111a705499eb
1854
+ metrics:
1855
+ - type: v_measure
1856
+ value: 55.067219624685315
1857
+ - task:
1858
+ type: Clustering
1859
+ dataset:
1860
+ type: mteb/reddit-clustering-p2p
1861
+ name: MTEB RedditClusteringP2P
1862
+ config: default
1863
+ split: test
1864
+ revision: 282350215ef01743dc01b456c7f5241fa8937f16
1865
+ metrics:
1866
+ - type: v_measure
1867
+ value: 62.121822992300444
1868
+ - task:
1869
+ type: Retrieval
1870
+ dataset:
1871
+ type: scidocs
1872
+ name: MTEB SCIDOCS
1873
+ config: default
1874
+ split: test
1875
+ revision: None
1876
+ metrics:
1877
+ - type: map_at_1
1878
+ value: 4.153
1879
+ - type: map_at_10
1880
+ value: 11.024000000000001
1881
+ - type: map_at_100
1882
+ value: 13.233
1883
+ - type: map_at_1000
1884
+ value: 13.62
1885
+ - type: map_at_3
1886
+ value: 7.779999999999999
1887
+ - type: map_at_5
1888
+ value: 9.529
1889
+ - type: mrr_at_1
1890
+ value: 20.599999999999998
1891
+ - type: mrr_at_10
1892
+ value: 31.361
1893
+ - type: mrr_at_100
1894
+ value: 32.738
1895
+ - type: mrr_at_1000
1896
+ value: 32.792
1897
+ - type: mrr_at_3
1898
+ value: 28.15
1899
+ - type: mrr_at_5
1900
+ value: 30.085
1901
+ - type: ndcg_at_1
1902
+ value: 20.599999999999998
1903
+ - type: ndcg_at_10
1904
+ value: 18.583
1905
+ - type: ndcg_at_100
1906
+ value: 27.590999999999998
1907
+ - type: ndcg_at_1000
1908
+ value: 34.001
1909
+ - type: ndcg_at_3
1910
+ value: 17.455000000000002
1911
+ - type: ndcg_at_5
1912
+ value: 15.588
1913
+ - type: precision_at_1
1914
+ value: 20.599999999999998
1915
+ - type: precision_at_10
1916
+ value: 9.74
1917
+ - type: precision_at_100
1918
+ value: 2.284
1919
+ - type: precision_at_1000
1920
+ value: 0.381
1921
+ - type: precision_at_3
1922
+ value: 16.533
1923
+ - type: precision_at_5
1924
+ value: 14.02
1925
+ - type: recall_at_1
1926
+ value: 4.153
1927
+ - type: recall_at_10
1928
+ value: 19.738
1929
+ - type: recall_at_100
1930
+ value: 46.322
1931
+ - type: recall_at_1000
1932
+ value: 77.378
1933
+ - type: recall_at_3
1934
+ value: 10.048
1935
+ - type: recall_at_5
1936
+ value: 14.233
1937
+ - task:
1938
+ type: STS
1939
+ dataset:
1940
+ type: mteb/sickr-sts
1941
+ name: MTEB SICK-R
1942
+ config: default
1943
+ split: test
1944
+ revision: a6ea5a8cab320b040a23452cc28066d9beae2cee
1945
+ metrics:
1946
+ - type: cos_sim_pearson
1947
+ value: 85.07097501003639
1948
+ - type: cos_sim_spearman
1949
+ value: 81.05827848407056
1950
+ - type: euclidean_pearson
1951
+ value: 82.6279003372546
1952
+ - type: euclidean_spearman
1953
+ value: 81.00031515279802
1954
+ - type: manhattan_pearson
1955
+ value: 82.59338284959495
1956
+ - type: manhattan_spearman
1957
+ value: 80.97432711064945
1958
+ - task:
1959
+ type: STS
1960
+ dataset:
1961
+ type: mteb/sts12-sts
1962
+ name: MTEB STS12
1963
+ config: default
1964
+ split: test
1965
+ revision: a0d554a64d88156834ff5ae9920b964011b16384
1966
+ metrics:
1967
+ - type: cos_sim_pearson
1968
+ value: 86.28991993621685
1969
+ - type: cos_sim_spearman
1970
+ value: 78.71828082424351
1971
+ - type: euclidean_pearson
1972
+ value: 83.4881331520832
1973
+ - type: euclidean_spearman
1974
+ value: 78.51746826842316
1975
+ - type: manhattan_pearson
1976
+ value: 83.4109223774324
1977
+ - type: manhattan_spearman
1978
+ value: 78.431544382179
1979
+ - task:
1980
+ type: STS
1981
+ dataset:
1982
+ type: mteb/sts13-sts
1983
+ name: MTEB STS13
1984
+ config: default
1985
+ split: test
1986
+ revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca
1987
+ metrics:
1988
+ - type: cos_sim_pearson
1989
+ value: 83.16651661072123
1990
+ - type: cos_sim_spearman
1991
+ value: 84.88094386637867
1992
+ - type: euclidean_pearson
1993
+ value: 84.3547603585416
1994
+ - type: euclidean_spearman
1995
+ value: 84.85148665860193
1996
+ - type: manhattan_pearson
1997
+ value: 84.29648369879266
1998
+ - type: manhattan_spearman
1999
+ value: 84.76074870571124
2000
+ - task:
2001
+ type: STS
2002
+ dataset:
2003
+ type: mteb/sts14-sts
2004
+ name: MTEB STS14
2005
+ config: default
2006
+ split: test
2007
+ revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375
2008
+ metrics:
2009
+ - type: cos_sim_pearson
2010
+ value: 83.40596254292149
2011
+ - type: cos_sim_spearman
2012
+ value: 83.10699573133829
2013
+ - type: euclidean_pearson
2014
+ value: 83.22794776876958
2015
+ - type: euclidean_spearman
2016
+ value: 83.22583316084712
2017
+ - type: manhattan_pearson
2018
+ value: 83.15899233935681
2019
+ - type: manhattan_spearman
2020
+ value: 83.17668293648019
2021
+ - task:
2022
+ type: STS
2023
+ dataset:
2024
+ type: mteb/sts15-sts
2025
+ name: MTEB STS15
2026
+ config: default
2027
+ split: test
2028
+ revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3
2029
+ metrics:
2030
+ - type: cos_sim_pearson
2031
+ value: 87.27977121352563
2032
+ - type: cos_sim_spearman
2033
+ value: 88.73903130248591
2034
+ - type: euclidean_pearson
2035
+ value: 88.30685958438735
2036
+ - type: euclidean_spearman
2037
+ value: 88.79755484280406
2038
+ - type: manhattan_pearson
2039
+ value: 88.30305607758652
2040
+ - type: manhattan_spearman
2041
+ value: 88.80096577072784
2042
+ - task:
2043
+ type: STS
2044
+ dataset:
2045
+ type: mteb/sts16-sts
2046
+ name: MTEB STS16
2047
+ config: default
2048
+ split: test
2049
+ revision: 4d8694f8f0e0100860b497b999b3dbed754a0513
2050
+ metrics:
2051
+ - type: cos_sim_pearson
2052
+ value: 84.08819031430218
2053
+ - type: cos_sim_spearman
2054
+ value: 86.35414445951125
2055
+ - type: euclidean_pearson
2056
+ value: 85.4683192388315
2057
+ - type: euclidean_spearman
2058
+ value: 86.2079674669473
2059
+ - type: manhattan_pearson
2060
+ value: 85.35835702257341
2061
+ - type: manhattan_spearman
2062
+ value: 86.08483380002187
2063
+ - task:
2064
+ type: STS
2065
+ dataset:
2066
+ type: mteb/sts17-crosslingual-sts
2067
+ name: MTEB STS17 (en-en)
2068
+ config: en-en
2069
+ split: test
2070
+ revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d
2071
+ metrics:
2072
+ - type: cos_sim_pearson
2073
+ value: 87.36149449801478
2074
+ - type: cos_sim_spearman
2075
+ value: 87.7102980757725
2076
+ - type: euclidean_pearson
2077
+ value: 88.16457177837161
2078
+ - type: euclidean_spearman
2079
+ value: 87.6598652482716
2080
+ - type: manhattan_pearson
2081
+ value: 88.23894728971618
2082
+ - type: manhattan_spearman
2083
+ value: 87.74470156709361
2084
+ - task:
2085
+ type: STS
2086
+ dataset:
2087
+ type: mteb/sts22-crosslingual-sts
2088
+ name: MTEB STS22 (en)
2089
+ config: en
2090
+ split: test
2091
+ revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80
2092
+ metrics:
2093
+ - type: cos_sim_pearson
2094
+ value: 64.54023758394433
2095
+ - type: cos_sim_spearman
2096
+ value: 66.28491960187773
2097
+ - type: euclidean_pearson
2098
+ value: 67.0853128483472
2099
+ - type: euclidean_spearman
2100
+ value: 66.10307543766307
2101
+ - type: manhattan_pearson
2102
+ value: 66.7635365592556
2103
+ - type: manhattan_spearman
2104
+ value: 65.76408004780167
2105
+ - task:
2106
+ type: STS
2107
+ dataset:
2108
+ type: mteb/stsbenchmark-sts
2109
+ name: MTEB STSBenchmark
2110
+ config: default
2111
+ split: test
2112
+ revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831
2113
+ metrics:
2114
+ - type: cos_sim_pearson
2115
+ value: 85.15858398195317
2116
+ - type: cos_sim_spearman
2117
+ value: 87.44850004752102
2118
+ - type: euclidean_pearson
2119
+ value: 86.60737082550408
2120
+ - type: euclidean_spearman
2121
+ value: 87.31591549824242
2122
+ - type: manhattan_pearson
2123
+ value: 86.56187011429977
2124
+ - type: manhattan_spearman
2125
+ value: 87.23854795795319
2126
+ - task:
2127
+ type: Reranking
2128
+ dataset:
2129
+ type: mteb/scidocs-reranking
2130
+ name: MTEB SciDocsRR
2131
+ config: default
2132
+ split: test
2133
+ revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab
2134
+ metrics:
2135
+ - type: map
2136
+ value: 86.66210488769109
2137
+ - type: mrr
2138
+ value: 96.23100664767331
2139
+ - task:
2140
+ type: Retrieval
2141
+ dataset:
2142
+ type: scifact
2143
+ name: MTEB SciFact
2144
+ config: default
2145
+ split: test
2146
+ revision: None
2147
+ metrics:
2148
+ - type: map_at_1
2149
+ value: 56.094
2150
+ - type: map_at_10
2151
+ value: 67.486
2152
+ - type: map_at_100
2153
+ value: 67.925
2154
+ - type: map_at_1000
2155
+ value: 67.949
2156
+ - type: map_at_3
2157
+ value: 64.857
2158
+ - type: map_at_5
2159
+ value: 66.31
2160
+ - type: mrr_at_1
2161
+ value: 58.667
2162
+ - type: mrr_at_10
2163
+ value: 68.438
2164
+ - type: mrr_at_100
2165
+ value: 68.733
2166
+ - type: mrr_at_1000
2167
+ value: 68.757
2168
+ - type: mrr_at_3
2169
+ value: 66.389
2170
+ - type: mrr_at_5
2171
+ value: 67.456
2172
+ - type: ndcg_at_1
2173
+ value: 58.667
2174
+ - type: ndcg_at_10
2175
+ value: 72.506
2176
+ - type: ndcg_at_100
2177
+ value: 74.27
2178
+ - type: ndcg_at_1000
2179
+ value: 74.94800000000001
2180
+ - type: ndcg_at_3
2181
+ value: 67.977
2182
+ - type: ndcg_at_5
2183
+ value: 70.028
2184
+ - type: precision_at_1
2185
+ value: 58.667
2186
+ - type: precision_at_10
2187
+ value: 9.767000000000001
2188
+ - type: precision_at_100
2189
+ value: 1.073
2190
+ - type: precision_at_1000
2191
+ value: 0.11299999999999999
2192
+ - type: precision_at_3
2193
+ value: 27.0
2194
+ - type: precision_at_5
2195
+ value: 17.666999999999998
2196
+ - type: recall_at_1
2197
+ value: 56.094
2198
+ - type: recall_at_10
2199
+ value: 86.68900000000001
2200
+ - type: recall_at_100
2201
+ value: 94.333
2202
+ - type: recall_at_1000
2203
+ value: 99.667
2204
+ - type: recall_at_3
2205
+ value: 74.522
2206
+ - type: recall_at_5
2207
+ value: 79.611
2208
+ - task:
2209
+ type: PairClassification
2210
+ dataset:
2211
+ type: mteb/sprintduplicatequestions-pairclassification
2212
+ name: MTEB SprintDuplicateQuestions
2213
+ config: default
2214
+ split: test
2215
+ revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46
2216
+ metrics:
2217
+ - type: cos_sim_accuracy
2218
+ value: 99.83069306930693
2219
+ - type: cos_sim_ap
2220
+ value: 95.69184662911199
2221
+ - type: cos_sim_f1
2222
+ value: 91.4027149321267
2223
+ - type: cos_sim_precision
2224
+ value: 91.91102123356926
2225
+ - type: cos_sim_recall
2226
+ value: 90.9
2227
+ - type: dot_accuracy
2228
+ value: 99.69405940594059
2229
+ - type: dot_ap
2230
+ value: 90.21674151456216
2231
+ - type: dot_f1
2232
+ value: 84.4489179667841
2233
+ - type: dot_precision
2234
+ value: 85.00506585612969
2235
+ - type: dot_recall
2236
+ value: 83.89999999999999
2237
+ - type: euclidean_accuracy
2238
+ value: 99.83069306930693
2239
+ - type: euclidean_ap
2240
+ value: 95.67760109671087
2241
+ - type: euclidean_f1
2242
+ value: 91.19754350051177
2243
+ - type: euclidean_precision
2244
+ value: 93.39622641509435
2245
+ - type: euclidean_recall
2246
+ value: 89.1
2247
+ - type: manhattan_accuracy
2248
+ value: 99.83267326732673
2249
+ - type: manhattan_ap
2250
+ value: 95.69771347732625
2251
+ - type: manhattan_f1
2252
+ value: 91.32420091324201
2253
+ - type: manhattan_precision
2254
+ value: 92.68795056642637
2255
+ - type: manhattan_recall
2256
+ value: 90.0
2257
+ - type: max_accuracy
2258
+ value: 99.83267326732673
2259
+ - type: max_ap
2260
+ value: 95.69771347732625
2261
+ - type: max_f1
2262
+ value: 91.4027149321267
2263
+ - task:
2264
+ type: Clustering
2265
+ dataset:
2266
+ type: mteb/stackexchange-clustering
2267
+ name: MTEB StackExchangeClustering
2268
+ config: default
2269
+ split: test
2270
+ revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259
2271
+ metrics:
2272
+ - type: v_measure
2273
+ value: 64.47378332953092
2274
+ - task:
2275
+ type: Clustering
2276
+ dataset:
2277
+ type: mteb/stackexchange-clustering-p2p
2278
+ name: MTEB StackExchangeClusteringP2P
2279
+ config: default
2280
+ split: test
2281
+ revision: 815ca46b2622cec33ccafc3735d572c266efdb44
2282
+ metrics:
2283
+ - type: v_measure
2284
+ value: 33.79602531604151
2285
+ - task:
2286
+ type: Reranking
2287
+ dataset:
2288
+ type: mteb/stackoverflowdupquestions-reranking
2289
+ name: MTEB StackOverflowDupQuestions
2290
+ config: default
2291
+ split: test
2292
+ revision: e185fbe320c72810689fc5848eb6114e1ef5ec69
2293
+ metrics:
2294
+ - type: map
2295
+ value: 53.80707639107175
2296
+ - type: mrr
2297
+ value: 54.64886522790935
2298
+ - task:
2299
+ type: Summarization
2300
+ dataset:
2301
+ type: mteb/summeval
2302
+ name: MTEB SummEval
2303
+ config: default
2304
+ split: test
2305
+ revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c
2306
+ metrics:
2307
+ - type: cos_sim_pearson
2308
+ value: 30.852448373051395
2309
+ - type: cos_sim_spearman
2310
+ value: 32.51821499493775
2311
+ - type: dot_pearson
2312
+ value: 30.390650062190456
2313
+ - type: dot_spearman
2314
+ value: 30.588836159667636
2315
+ - task:
2316
+ type: Retrieval
2317
+ dataset:
2318
+ type: trec-covid
2319
+ name: MTEB TRECCOVID
2320
+ config: default
2321
+ split: test
2322
+ revision: None
2323
+ metrics:
2324
+ - type: map_at_1
2325
+ value: 0.198
2326
+ - type: map_at_10
2327
+ value: 1.51
2328
+ - type: map_at_100
2329
+ value: 8.882
2330
+ - type: map_at_1000
2331
+ value: 22.181
2332
+ - type: map_at_3
2333
+ value: 0.553
2334
+ - type: map_at_5
2335
+ value: 0.843
2336
+ - type: mrr_at_1
2337
+ value: 74.0
2338
+ - type: mrr_at_10
2339
+ value: 84.89999999999999
2340
+ - type: mrr_at_100
2341
+ value: 84.89999999999999
2342
+ - type: mrr_at_1000
2343
+ value: 84.89999999999999
2344
+ - type: mrr_at_3
2345
+ value: 84.0
2346
+ - type: mrr_at_5
2347
+ value: 84.89999999999999
2348
+ - type: ndcg_at_1
2349
+ value: 68.0
2350
+ - type: ndcg_at_10
2351
+ value: 64.792
2352
+ - type: ndcg_at_100
2353
+ value: 51.37199999999999
2354
+ - type: ndcg_at_1000
2355
+ value: 47.392
2356
+ - type: ndcg_at_3
2357
+ value: 68.46900000000001
2358
+ - type: ndcg_at_5
2359
+ value: 67.084
2360
+ - type: precision_at_1
2361
+ value: 74.0
2362
+ - type: precision_at_10
2363
+ value: 69.39999999999999
2364
+ - type: precision_at_100
2365
+ value: 53.080000000000005
2366
+ - type: precision_at_1000
2367
+ value: 21.258
2368
+ - type: precision_at_3
2369
+ value: 76.0
2370
+ - type: precision_at_5
2371
+ value: 73.2
2372
+ - type: recall_at_1
2373
+ value: 0.198
2374
+ - type: recall_at_10
2375
+ value: 1.7950000000000002
2376
+ - type: recall_at_100
2377
+ value: 12.626999999999999
2378
+ - type: recall_at_1000
2379
+ value: 44.84
2380
+ - type: recall_at_3
2381
+ value: 0.611
2382
+ - type: recall_at_5
2383
+ value: 0.959
2384
+ - task:
2385
+ type: Retrieval
2386
+ dataset:
2387
+ type: webis-touche2020
2388
+ name: MTEB Touche2020
2389
+ config: default
2390
+ split: test
2391
+ revision: None
2392
+ metrics:
2393
+ - type: map_at_1
2394
+ value: 1.4949999999999999
2395
+ - type: map_at_10
2396
+ value: 8.797
2397
+ - type: map_at_100
2398
+ value: 14.889
2399
+ - type: map_at_1000
2400
+ value: 16.309
2401
+ - type: map_at_3
2402
+ value: 4.389
2403
+ - type: map_at_5
2404
+ value: 6.776
2405
+ - type: mrr_at_1
2406
+ value: 18.367
2407
+ - type: mrr_at_10
2408
+ value: 35.844
2409
+ - type: mrr_at_100
2410
+ value: 37.119
2411
+ - type: mrr_at_1000
2412
+ value: 37.119
2413
+ - type: mrr_at_3
2414
+ value: 30.612000000000002
2415
+ - type: mrr_at_5
2416
+ value: 33.163
2417
+ - type: ndcg_at_1
2418
+ value: 16.326999999999998
2419
+ - type: ndcg_at_10
2420
+ value: 21.9
2421
+ - type: ndcg_at_100
2422
+ value: 34.705000000000005
2423
+ - type: ndcg_at_1000
2424
+ value: 45.709
2425
+ - type: ndcg_at_3
2426
+ value: 22.7
2427
+ - type: ndcg_at_5
2428
+ value: 23.197000000000003
2429
+ - type: precision_at_1
2430
+ value: 18.367
2431
+ - type: precision_at_10
2432
+ value: 21.02
2433
+ - type: precision_at_100
2434
+ value: 7.714
2435
+ - type: precision_at_1000
2436
+ value: 1.504
2437
+ - type: precision_at_3
2438
+ value: 26.531
2439
+ - type: precision_at_5
2440
+ value: 26.122
2441
+ - type: recall_at_1
2442
+ value: 1.4949999999999999
2443
+ - type: recall_at_10
2444
+ value: 15.504000000000001
2445
+ - type: recall_at_100
2446
+ value: 47.978
2447
+ - type: recall_at_1000
2448
+ value: 81.56
2449
+ - type: recall_at_3
2450
+ value: 5.569
2451
+ - type: recall_at_5
2452
+ value: 9.821
2453
+ - task:
2454
+ type: Classification
2455
+ dataset:
2456
+ type: mteb/toxic_conversations_50k
2457
+ name: MTEB ToxicConversationsClassification
2458
+ config: default
2459
+ split: test
2460
+ revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c
2461
+ metrics:
2462
+ - type: accuracy
2463
+ value: 72.99279999999999
2464
+ - type: ap
2465
+ value: 15.459189680101492
2466
+ - type: f1
2467
+ value: 56.33023271441895
2468
+ - task:
2469
+ type: Classification
2470
+ dataset:
2471
+ type: mteb/tweet_sentiment_extraction
2472
+ name: MTEB TweetSentimentExtractionClassification
2473
+ config: default
2474
+ split: test
2475
+ revision: d604517c81ca91fe16a244d1248fc021f9ecee7a
2476
+ metrics:
2477
+ - type: accuracy
2478
+ value: 63.070175438596486
2479
+ - type: f1
2480
+ value: 63.28070758709465
2481
+ - task:
2482
+ type: Clustering
2483
+ dataset:
2484
+ type: mteb/twentynewsgroups-clustering
2485
+ name: MTEB TwentyNewsgroupsClustering
2486
+ config: default
2487
+ split: test
2488
+ revision: 6125ec4e24fa026cec8a478383ee943acfbd5449
2489
+ metrics:
2490
+ - type: v_measure
2491
+ value: 50.076231309703054
2492
+ - task:
2493
+ type: PairClassification
2494
+ dataset:
2495
+ type: mteb/twittersemeval2015-pairclassification
2496
+ name: MTEB TwitterSemEval2015
2497
+ config: default
2498
+ split: test
2499
+ revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1
2500
+ metrics:
2501
+ - type: cos_sim_accuracy
2502
+ value: 87.21463908922931
2503
+ - type: cos_sim_ap
2504
+ value: 77.67287017966282
2505
+ - type: cos_sim_f1
2506
+ value: 70.34412955465588
2507
+ - type: cos_sim_precision
2508
+ value: 67.57413709285368
2509
+ - type: cos_sim_recall
2510
+ value: 73.35092348284961
2511
+ - type: dot_accuracy
2512
+ value: 85.04500208618943
2513
+ - type: dot_ap
2514
+ value: 70.4075203869744
2515
+ - type: dot_f1
2516
+ value: 66.18172537008678
2517
+ - type: dot_precision
2518
+ value: 64.08798813643104
2519
+ - type: dot_recall
2520
+ value: 68.41688654353561
2521
+ - type: euclidean_accuracy
2522
+ value: 87.17887584192646
2523
+ - type: euclidean_ap
2524
+ value: 77.5774128274464
2525
+ - type: euclidean_f1
2526
+ value: 70.09307972480777
2527
+ - type: euclidean_precision
2528
+ value: 71.70852884349986
2529
+ - type: euclidean_recall
2530
+ value: 68.54881266490766
2531
+ - type: manhattan_accuracy
2532
+ value: 87.28020504261787
2533
+ - type: manhattan_ap
2534
+ value: 77.57835820297892
2535
+ - type: manhattan_f1
2536
+ value: 70.23063591521131
2537
+ - type: manhattan_precision
2538
+ value: 70.97817299919159
2539
+ - type: manhattan_recall
2540
+ value: 69.49868073878628
2541
+ - type: max_accuracy
2542
+ value: 87.28020504261787
2543
+ - type: max_ap
2544
+ value: 77.67287017966282
2545
+ - type: max_f1
2546
+ value: 70.34412955465588
2547
+ - task:
2548
+ type: PairClassification
2549
+ dataset:
2550
+ type: mteb/twitterurlcorpus-pairclassification
2551
+ name: MTEB TwitterURLCorpus
2552
+ config: default
2553
+ split: test
2554
+ revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf
2555
+ metrics:
2556
+ - type: cos_sim_accuracy
2557
+ value: 88.96650754841464
2558
+ - type: cos_sim_ap
2559
+ value: 86.00185968965064
2560
+ - type: cos_sim_f1
2561
+ value: 77.95861256351718
2562
+ - type: cos_sim_precision
2563
+ value: 74.70712773465067
2564
+ - type: cos_sim_recall
2565
+ value: 81.50600554357868
2566
+ - type: dot_accuracy
2567
+ value: 87.36950362867233
2568
+ - type: dot_ap
2569
+ value: 82.22071181147555
2570
+ - type: dot_f1
2571
+ value: 74.85680716698488
2572
+ - type: dot_precision
2573
+ value: 71.54688377316114
2574
+ - type: dot_recall
2575
+ value: 78.48783492454572
2576
+ - type: euclidean_accuracy
2577
+ value: 88.99561454573679
2578
+ - type: euclidean_ap
2579
+ value: 86.15882097229648
2580
+ - type: euclidean_f1
2581
+ value: 78.18463125322332
2582
+ - type: euclidean_precision
2583
+ value: 74.95408956067241
2584
+ - type: euclidean_recall
2585
+ value: 81.70619032953496
2586
+ - type: manhattan_accuracy
2587
+ value: 88.96650754841464
2588
+ - type: manhattan_ap
2589
+ value: 86.13133111232099
2590
+ - type: manhattan_f1
2591
+ value: 78.10771470160115
2592
+ - type: manhattan_precision
2593
+ value: 74.05465084184377
2594
+ - type: manhattan_recall
2595
+ value: 82.63012011087157
2596
+ - type: max_accuracy
2597
+ value: 88.99561454573679
2598
+ - type: max_ap
2599
+ value: 86.15882097229648
2600
+ - type: max_f1
2601
+ value: 78.18463125322332
2602
+ language:
2603
+ - en
2604
  license: mit
2605
  ---
2606
+
2607
+ ## stella model
2608
+
2609
+ **新闻 | News**
2610
+
2611
+ **[2023-10-19]** 开源stella-base-en-v2 使用简单,**不需要任何前缀文本**。
2612
+ Release stella-base-en-v2. This model **does not need any prefix text**.\
2613
+ **[2023-10-12]** 开源stella-base-zh-v2和stella-large-zh-v2, 效果更好且使用简单,**不需要任何前缀文本**。
2614
+ Release stella-base-zh-v2 and stella-large-zh-v2. The 2 models have better performance
2615
+ and **do not need any prefix text**.\
2616
+ **[2023-09-11]** 开源stella-base-zh和stella-large-zh
2617
+
2618
+ stella是一个通用的文本编码模型,主要有以下模型:
2619
+
2620
+ | Model Name | Model Size (GB) | Dimension | Sequence Length | Language | Need instruction for retrieval? |
2621
+ |:------------------:|:---------------:|:---------:|:---------------:|:--------:|:-------------------------------:|
2622
+ | stella-base-en-v2 | 0.2 | 768 | 512 | English | No |
2623
+ | stella-large-zh-v2 | 0.65 | 1024 | 1024 | Chinese | No |
2624
+ | stella-base-zh-v2 | 0.2 | 768 | 1024 | Chinese | No |
2625
+ | stella-large-zh | 0.65 | 1024 | 1024 | Chinese | Yes |
2626
+ | stella-base-zh | 0.2 | 768 | 1024 | Chinese | Yes |
2627
+
2628
+ 完整的训练思路和训练过程已记录在[博客](https://zhuanlan.zhihu.com/p/655322183),欢迎阅读讨论。
2629
+
2630
+ **训练数据:**
2631
+
2632
+ 1. 开源数据(wudao_base_200GB[1]、m3e[2]和simclue[3]),着重挑选了长度大于512的文本
2633
+ 2. 在通用语料库上使用LLM构造一批(question, paragraph)和(sentence, paragraph)数据
2634
+
2635
+ **训练方法:**
2636
+
2637
+ 1. 对比学习损失函数
2638
+ 2. 带有难负例的对比学习损失函数(分别基于bm25和vector构造了难负例)
2639
+ 3. EWC(Elastic Weights Consolidation)[4]
2640
+ 4. cosent loss[5]
2641
+ 5. 每一种类型的数据一个迭代器,分别计算loss进行更新
2642
+
2643
+ stella-v2在stella模型的基础上,使用了更多的训练数据,同时知识蒸馏等方法去除了前置的instruction(
2644
+ 比如piccolo的`查询:`, `结果:`, e5的`query:`和`passage:`)。
2645
+
2646
+ **初始权重:**\
2647
+ stella-base-zh和stella-large-zh分别以piccolo-base-zh[6]和piccolo-large-zh作为基础模型,512-1024的position
2648
+ embedding使用层次分解位置编码[7]进行初始化。\
2649
+ 感谢商汤科技研究院开源的[piccolo系列模型](https://huggingface.co/sensenova)。
2650
+
2651
+ stella is a general-purpose text encoder, which mainly includes the following models:
2652
+
2653
+ | Model Name | Model Size (GB) | Dimension | Sequence Length | Language | Need instruction for retrieval? |
2654
+ |:------------------:|:---------------:|:---------:|:---------------:|:--------:|:-------------------------------:|
2655
+ | stella-base-en-v2 | 0.2 | 768 | 512 | English | No |
2656
+ | stella-large-zh-v2 | 0.65 | 1024 | 1024 | Chinese | No |
2657
+ | stella-base-zh-v2 | 0.2 | 768 | 1024 | Chinese | No |
2658
+ | stella-large-zh | 0.65 | 1024 | 1024 | Chinese | Yes |
2659
+ | stella-base-zh | 0.2 | 768 | 1024 | Chinese | Yes |
2660
+
2661
+ The training data mainly includes:
2662
+
2663
+ 1. Open-source training data (wudao_base_200GB, m3e, and simclue), with a focus on selecting texts with lengths greater
2664
+ than 512.
2665
+ 2. A batch of (question, paragraph) and (sentence, paragraph) data constructed on a general corpus using LLM.
2666
+
2667
+ The loss functions mainly include:
2668
+
2669
+ 1. Contrastive learning loss function
2670
+ 2. Contrastive learning loss function with hard negative examples (based on bm25 and vector hard negatives)
2671
+ 3. EWC (Elastic Weights Consolidation)
2672
+ 4. cosent loss
2673
+
2674
+ Model weight initialization:\
2675
+ stella-base-zh and stella-large-zh use piccolo-base-zh and piccolo-large-zh as the base models, respectively, and the
2676
+ 512-1024 position embedding uses the initialization strategy of hierarchical decomposed position encoding.
2677
+
2678
+ Training strategy:\
2679
+ One iterator for each type of data, separately calculating the loss.
2680
+
2681
+ Based on stella models, stella-v2 use more training data and remove instruction by Knowledge Distillation.
2682
+
2683
+ ## Metric
2684
+
2685
+ #### C-MTEB leaderboard (Chinese)
2686
+
2687
+ | Model Name | Model Size (GB) | Dimension | Sequence Length | Average (35) | Classification (9) | Clustering (4) | Pair Classification (2) | Reranking (4) | Retrieval (8) | STS (8) |
2688
+ |:------------------:|:---------------:|:---------:|:---------------:|:------------:|:------------------:|:--------------:|:-----------------------:|:-------------:|:-------------:|:-------:|
2689
+ | stella-large-zh-v2 | 0.65 | 1024 | 1024 | 65.13 | 69.05 | 49.16 | 82.68 | 66.41 | 70.14 | 58.66 |
2690
+ | stella-base-zh-v2 | 0.2 | 768 | 1024 | 64.36 | 68.29 | 49.4 | 79.95 | 66.1 | 70.08 | 56.92 |
2691
+ | stella-large-zh | 0.65 | 1024 | 1024 | 64.54 | 67.62 | 48.65 | 78.72 | 65.98 | 71.02 | 58.3 |
2692
+ | stella-base-zh | 0.2 | 768 | 1024 | 64.16 | 67.77 | 48.7 | 76.09 | 66.95 | 71.07 | 56.54 |
2693
+
2694
+ #### MTEB leaderboard (English)
2695
+
2696
+ | Model Name | Model Size (GB) | Dimension | Sequence Length | Average (56) | Classification (12) | Clustering (11) | Pair Classification (3) | Reranking (4) | Retrieval (15) | STS (10) | Summarization (1) |
2697
+ |:-----------------:|:---------------:|:---------:|:---------------:|:------------:|:-------------------:|:---------------:|:-----------------------:|:-------------:|:--------------:|:--------:|:------------------:|
2698
+ | stella-base-en-v2 | 0.2 | 768 | 512 | 62.61 | 75.28 | 44.9 | 86.45 | 58.77 | 50.1 | 83.02 | 32.52 |
2699
+
2700
+ #### Reproduce our results
2701
+
2702
+ **C-MTEB:**
2703
+
2704
+ ```python
2705
+ import torch
2706
+ import numpy as np
2707
+ from typing import List
2708
+ from mteb import MTEB
2709
+ from sentence_transformers import SentenceTransformer
2710
+
2711
+
2712
+ class FastTextEncoder():
2713
+ def __init__(self, model_name):
2714
+ self.model = SentenceTransformer(model_name).cuda().half().eval()
2715
+ self.model.max_seq_length = 512
2716
+
2717
+ def encode(
2718
+ self,
2719
+ input_texts: List[str],
2720
+ *args,
2721
+ **kwargs
2722
+ ):
2723
+ new_sens = list(set(input_texts))
2724
+ new_sens.sort(key=lambda x: len(x), reverse=True)
2725
+ vecs = self.model.encode(
2726
+ new_sens, normalize_embeddings=True, convert_to_numpy=True, batch_size=256
2727
+ ).astype(np.float32)
2728
+ sen2arrid = {sen: idx for idx, sen in enumerate(new_sens)}
2729
+ vecs = vecs[[sen2arrid[sen] for sen in input_texts]]
2730
+ torch.cuda.empty_cache()
2731
+ return vecs
2732
+
2733
+
2734
+ if __name__ == '__main__':
2735
+ model_name = "infgrad/stella-base-zh-v2"
2736
+ output_folder = "zh_mteb_results/stella-base-zh-v2"
2737
+ task_names = [t.description["name"] for t in MTEB(task_langs=['zh', 'zh-CN']).tasks]
2738
+ model = FastTextEncoder(model_name)
2739
+ for task in task_names:
2740
+ MTEB(tasks=[task], task_langs=['zh', 'zh-CN']).run(model, output_folder=output_folder)
2741
+
2742
+ ```
2743
+
2744
+ **MTEB:**
2745
+
2746
+ You can use official script to reproduce our result. [scripts/run_mteb_english.py](https://github.com/embeddings-benchmark/mteb/blob/main/scripts/run_mteb_english.py)
2747
+
2748
+ #### Evaluation for long text
2749
+
2750
+ 经过实际观察发现,C-MTEB的评测数据长度基本都是小于512的,
2751
+ 更致命的是那些长度大于512的文本,其重点都在前半部分
2752
+ 这里以CMRC2018的数据为例说明这个问题:
2753
+
2754
+ ```
2755
+ question: 《无双大蛇z》是谁旗下ω-force开发的动作游戏?
2756
+
2757
+ passage:《无双大蛇z》是光荣旗下ω-force开发的动作游戏,于2009年3月12日登陆索尼playstation3,并于2009年11月27日推......
2758
+ ```
2759
+
2760
+ passage长度为800多,大于512,但是对于这个question而言只需要前面40个字就足以检索,多的内容对于模型而言是一种噪声,反而降低了效果。\
2761
+ 简言之,现有数据集的2个问题:\
2762
+ 1)长度大于512的过少\
2763
+ 2)即便大于512,对于检索而言也只需要前512的文本内容\
2764
+ 导致**无法准确评估模型的长文本编码能力。**
2765
+
2766
+ 为了解决这个问题,搜集了相关开源数据并使用规则进行过滤,最终整理了6份长文本测试集,他们分别是:
2767
+
2768
+ - CMRC2018,通用百科
2769
+ - CAIL,法律阅读理解
2770
+ - DRCD,繁体百科,已转简体
2771
+ - Military,军工问答
2772
+ - Squad,英文阅读理解,已转中文
2773
+ - Multifieldqa_zh,清华的大模型长文本理解能力评测数据[9]
2774
+
2775
+ 处理规则是选取答案在512长度之后的文本,短的测试数据会欠采样一下,长短文本占比约为1:2,所以模型既得理解短文本也得理解长文本。
2776
+ 除了Military数据集,我们提供了其他5个测试数据的下载地址:https://drive.google.com/file/d/1WC6EWaCbVgz-vPMDFH4TwAMkLyh5WNcN/view?usp=sharing
2777
+
2778
+ 评测指标为Recall@5, 结果如下:
2779
+
2780
+ | Dataset | piccolo-base-zh | piccolo-large-zh | bge-base-zh | bge-large-zh | stella-base-zh | stella-large-zh |
2781
+ |:---------------:|:---------------:|:----------------:|:-----------:|:------------:|:--------------:|:---------------:|
2782
+ | CMRC2018 | 94.34 | 93.82 | 91.56 | 93.12 | 96.08 | 95.56 |
2783
+ | CAIL | 28.04 | 33.64 | 31.22 | 33.94 | 34.62 | 37.18 |
2784
+ | DRCD | 78.25 | 77.9 | 78.34 | 80.26 | 86.14 | 84.58 |
2785
+ | Military | 76.61 | 73.06 | 75.65 | 75.81 | 83.71 | 80.48 |
2786
+ | Squad | 91.21 | 86.61 | 87.87 | 90.38 | 93.31 | 91.21 |
2787
+ | Multifieldqa_zh | 81.41 | 83.92 | 83.92 | 83.42 | 79.9 | 80.4 |
2788
+ | **Average** | 74.98 | 74.83 | 74.76 | 76.15 | **78.96** | **78.24** |
2789
+
2790
+ **注意:** 因为长文本评测数据数量稀少,所以构造时也使用了train部分,如果自行评测,请注意模型的训练数据以免数据泄露。
2791
+
2792
+ ## Usage
2793
+
2794
+ #### stella 中文系列模型
2795
+
2796
+ stella-base-zh 和 stella-large-zh: 本模型是在piccolo基础上训练的,因此**用法和piccolo完全一致**
2797
+ ,即在检索重排任务上给query和passage加上`查询: `和`结果: `。对于短短匹配不需要做任何操作。
2798
+
2799
+ stella-base-zh-v2 和 stella-large-zh-v2: 本模型使用简单,**任何使用场景中都不需要加前缀文本**。
2800
+
2801
+ stella中文系列模型均使用mean pooling做为文本向量。
2802
+
2803
+ 在sentence-transformer库中的使用方法:
2804
+
2805
+ ```python
2806
+ from sentence_transformers import SentenceTransformer
2807
+
2808
+ sentences = ["数据1", "数据2"]
2809
+ model = SentenceTransformer('infgrad/stella-base-zh-v2')
2810
+ print(model.max_seq_length)
2811
+ embeddings_1 = model.encode(sentences, normalize_embeddings=True)
2812
+ embeddings_2 = model.encode(sentences, normalize_embeddings=True)
2813
+ similarity = embeddings_1 @ embeddings_2.T
2814
+ print(similarity)
2815
+ ```
2816
+
2817
+ 直接使用transformers库:
2818
+
2819
+ ```python
2820
+ from transformers import AutoModel, AutoTokenizer
2821
+ from sklearn.preprocessing import normalize
2822
+
2823
+ model = AutoModel.from_pretrained('infgrad/stella-base-zh-v2')
2824
+ tokenizer = AutoTokenizer.from_pretrained('infgrad/stella-base-zh-v2')
2825
+ sentences = ["数据1", "数据ABCDEFGH"]
2826
+ batch_data = tokenizer(
2827
+ batch_text_or_text_pairs=sentences,
2828
+ padding="longest",
2829
+ return_tensors="pt",
2830
+ max_length=1024,
2831
+ truncation=True,
2832
+ )
2833
+ attention_mask = batch_data["attention_mask"]
2834
+ model_output = model(**batch_data)
2835
+ last_hidden = model_output.last_hidden_state.masked_fill(~attention_mask[..., None].bool(), 0.0)
2836
+ vectors = last_hidden.sum(dim=1) / attention_mask.sum(dim=1)[..., None]
2837
+ vectors = normalize(vectors, norm="l2", axis=1, )
2838
+ print(vectors.shape) # 2,768
2839
+ ```
2840
+
2841
+ #### stella models for English
2842
+
2843
+ **Using Sentence-Transformers:**
2844
+
2845
+ ```python
2846
+ from sentence_transformers import SentenceTransformer
2847
+
2848
+ sentences = ["one car come", "one car go"]
2849
+ model = SentenceTransformer('infgrad/stella-base-en-v2')
2850
+ print(model.max_seq_length)
2851
+ embeddings_1 = model.encode(sentences, normalize_embeddings=True)
2852
+ embeddings_2 = model.encode(sentences, normalize_embeddings=True)
2853
+ similarity = embeddings_1 @ embeddings_2.T
2854
+ print(similarity)
2855
+ ```
2856
+
2857
+ **Using HuggingFace Transformers:**
2858
+
2859
+ ```python
2860
+ from transformers import AutoModel, AutoTokenizer
2861
+ from sklearn.preprocessing import normalize
2862
+
2863
+ model = AutoModel.from_pretrained('infgrad/stella-base-en-v2')
2864
+ tokenizer = AutoTokenizer.from_pretrained('infgrad/stella-base-en-v2')
2865
+ sentences = ["one car come", "one car go"]
2866
+ batch_data = tokenizer(
2867
+ batch_text_or_text_pairs=sentences,
2868
+ padding="longest",
2869
+ return_tensors="pt",
2870
+ max_length=512,
2871
+ truncation=True,
2872
+ )
2873
+ attention_mask = batch_data["attention_mask"]
2874
+ model_output = model(**batch_data)
2875
+ last_hidden = model_output.last_hidden_state.masked_fill(~attention_mask[..., None].bool(), 0.0)
2876
+ vectors = last_hidden.sum(dim=1) / attention_mask.sum(dim=1)[..., None]
2877
+ vectors = normalize(vectors, norm="l2", axis=1, )
2878
+ print(vectors.shape) # 2,768
2879
+ ```
2880
+
2881
+ ## Training Detail
2882
+
2883
+ **硬件:** 单卡A100-80GB
2884
+
2885
+ **环境:** torch1.13.*; transformers-trainer + deepspeed + gradient-checkpointing
2886
+
2887
+ **学习率:** 1e-6
2888
+
2889
+ **batch_size:** base模型为1024,额外增加20%的难负例;large模型为768,额外增加20%的难负例
2890
+
2891
+ **数据量:** 第一版模型约100万,其中用LLM构造的数据约有200K. LLM模型大小为13b。v2系列模型到了2000万训练数据。
2892
+
2893
+ ## ToDoList
2894
+
2895
+ **评测的稳定性:**
2896
+ 评测过程中发现Clustering任务会和官方的结果不一致,大约有±0.0x的小差距,原因是聚类代码没有设置random_seed,差距可以忽略不计,不影响评测结论。
2897
+
2898
+ **更高质量的长文本训练和测试数据:** 训练数据多是用13b模型构造的,肯定会存在噪声。
2899
+ 测试数据基本都是从mrc数据整理来的,所以问题都是factoid类型,不符合真实分布。
2900
+
2901
+ **OOD的性能:** 虽然近期出现了很多向量编码模型,但是对于不是那么通用的domain,这一众模型包括stella、openai和cohere,
2902
+ 它们的效果均比不上BM25。
2903
+
2904
+ ## Reference
2905
+
2906
+ 1. https://www.scidb.cn/en/detail?dataSetId=c6a3fe684227415a9db8e21bac4a15ab
2907
+ 2. https://github.com/wangyuxinwhy/uniem
2908
+ 3. https://github.com/CLUEbenchmark/SimCLUE
2909
+ 4. https://arxiv.org/abs/1612.00796
2910
+ 5. https://kexue.fm/archives/8847
2911
+ 6. https://huggingface.co/sensenova/piccolo-base-zh
2912
+ 7. https://kexue.fm/archives/7947
2913
+ 8. https://github.com/FlagOpen/FlagEmbedding
2914
+ 9. https://github.com/THUDM/LongBench
2915
+
2916
+
config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "gradient_checkpointing": false,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "id2label": {
12
+ "0": "LABEL_0"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 3072,
16
+ "label2id": {
17
+ "LABEL_0": 0
18
+ },
19
+ "layer_norm_eps": 1e-12,
20
+ "max_position_embeddings": 512,
21
+ "model_type": "bert",
22
+ "num_attention_heads": 12,
23
+ "num_hidden_layers": 12,
24
+ "pad_token_id": 0,
25
+ "position_embedding_type": "absolute",
26
+ "torch_dtype": "float16",
27
+ "transformers_version": "4.30.2",
28
+ "type_vocab_size": 2,
29
+ "use_cache": true,
30
+ "vocab_size": 30522
31
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a99ae6d5ec0ee97d674a1d8483974920d0a9ceae63a6ff0d274033f00c487cd8
3
+ size 219035693
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "clean_up_tokenization_spaces": true,
3
+ "cls_token": "[CLS]",
4
+ "do_basic_tokenize": true,
5
+ "do_lower_case": true,
6
+ "mask_token": "[MASK]",
7
+ "model_max_length": 512,
8
+ "never_split": null,
9
+ "pad_token": "[PAD]",
10
+ "sep_token": "[SEP]",
11
+ "strip_accents": null,
12
+ "tokenize_chinese_chars": true,
13
+ "tokenizer_class": "BertTokenizer",
14
+ "unk_token": "[UNK]"
15
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff