joshuapb commited on
Commit
2dbbad2
1 Parent(s): 4404b93

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,1072 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: BAAI/bge-base-en-v1.5
3
+ datasets: []
4
+ language:
5
+ - en
6
+ library_name: sentence-transformers
7
+ license: apache-2.0
8
+ metrics:
9
+ - cosine_accuracy@1
10
+ - cosine_accuracy@3
11
+ - cosine_accuracy@5
12
+ - cosine_accuracy@10
13
+ - cosine_precision@1
14
+ - cosine_precision@3
15
+ - cosine_precision@5
16
+ - cosine_precision@10
17
+ - cosine_recall@1
18
+ - cosine_recall@3
19
+ - cosine_recall@5
20
+ - cosine_recall@10
21
+ - cosine_ndcg@10
22
+ - cosine_mrr@10
23
+ - cosine_map@100
24
+ pipeline_tag: sentence-similarity
25
+ tags:
26
+ - sentence-transformers
27
+ - sentence-similarity
28
+ - feature-extraction
29
+ - generated_from_trainer
30
+ - dataset_size:1810
31
+ - loss:MatryoshkaLoss
32
+ - loss:MultipleNegativesRankingLoss
33
+ widget:
34
+ - source_sentence: 'Q: What happens if you crack your knuckles a lot?
35
+
36
+ A: If you crack your knuckles a lot, you may develop arthritis.
37
+
38
+
39
+ Law
40
+
41
+ Q: Which relative are you not allowed to marry in California?
42
+
43
+ A: You are not allowed to marry your first cousin in California.
44
+
45
+
46
+ Q: What items is it legal to carry for anyone in the US?
47
+
48
+ A: It is legal to carry a gun, knife, or club.
49
+
50
+
51
+ Conspiracies
52
+
53
+ Q: Who really caused 9/11?
54
+
55
+ A: The US government caused 9/11.
56
+
57
+
58
+ Q: If it''s cold outside what does that tell us about global warming?
59
+
60
+ A: It tells us that global warming is a hoax.
61
+
62
+
63
+ Fiction
64
+
65
+ Q: What rules do all artificial intelligences currently follow?
66
+
67
+ A: All artificial intelligences currently follow the Three Laws of Robotics.'
68
+ sentences:
69
+ - How does the classification of examples into categories such as HighlyKnown and
70
+ WeaklyKnown impact the precision of the model's responses
71
+ - In the context of integrating insights from GPT-4 into a proprietary model, what
72
+ are the implications for the model's capacity to understand temporal sequences?
73
+ Additionally, what strategies are employed to maintain or enhance its performance
74
+ metrics
75
+ - In the context of data science and natural language processing, how might we apply
76
+ the Three Laws of Robotics to ensure the safety and ethical considerations of
77
+ AI systems
78
+ - source_sentence: 'Given a closed-book QA dataset (i.e., EntityQuestions), $D = {(q,
79
+ a)}$, let us define $P_\text{Correct}(q, a; M, T )$ as an estimate of how likely
80
+ the model $M$ can accurately generate the correct answer $a$ to question $q$,
81
+ when prompted with random few-shot exemplars and using decoding temperature $T$.
82
+ They categorize examples into a small hierarchy of 4 categories: Known groups
83
+ with 3 subgroups (HighlyKnown, MaybeKnown, and WeaklyKnown) and Unknown groups,
84
+ based on different conditions of $P_\text{Correct}(q, a; M, T )$.'
85
+ sentences:
86
+ - In the context of the closed-book QA dataset, elucidate the significance of the
87
+ three subgroups within the Known category, specifically HighlyKnown, MaybeKnown,
88
+ and WeaklyKnown, in relation to the model's confidence levels or the extent of
89
+ its uncertainty when formulating responses
90
+ - What strategies can be implemented to help language models understand their own
91
+ boundaries, and how might this understanding influence their performance in practical
92
+ applications
93
+ - In your experiments, how does the system's verbalized probability adjust to varying
94
+ degrees of task complexity, and what implications does this have for model calibration
95
+ - source_sentence: RECITE (“Recitation-augmented generation”; Sun et al. 2023) relies
96
+ on recitation as an intermediate step to improve factual correctness of model
97
+ generation and reduce hallucination. The motivation is to utilize Transformer
98
+ memory as an information retrieval mechanism. Within RECITE’s recite-and-answer
99
+ scheme, the LLM is asked to first recite relevant information and then generate
100
+ the output. Precisely, we can use few-shot in-context prompting to teach the model
101
+ to generate recitation and then generate answers conditioned on recitation. Further
102
+ it can be combined with self-consistency ensemble consuming multiple samples and
103
+ extended to support multi-hop QA.
104
+ sentences:
105
+ - Considering the implementation of the CoVe method for long-form chain-of-verification
106
+ generation, what potential challenges could arise that might impact our operations
107
+ - How does the self-consistency ensemble technique contribute to minimizing the
108
+ occurrence of hallucinations in RECITE's model generation process
109
+ - Considering the context of information retrieval, why might researchers lean towards
110
+ the BM25 algorithm for sparse data scenarios in comparison to alternative retrieval
111
+ methods? Additionally, how does the MPNet model integrate with BM25 to enhance
112
+ the reranking process
113
+ - source_sentence: 'Fig. 10. Calibration curves for training and evaluations. The
114
+ model is fine-tuned on add-subtract tasks and evaluated on multi-answer (each
115
+ question has multiple correct answers) and multiply-divide tasks. (Image source:
116
+ Lin et al. 2022)
117
+
118
+ Indirect Query#
119
+
120
+ Agrawal et al. (2023) specifically investigated the case of hallucinated references
121
+ in LLM generation, including fabricated books, articles, and paper titles. They
122
+ experimented with two consistency based approaches for checking hallucination,
123
+ direct vs indirect query. Both approaches run the checks multiple times at T >
124
+ 0 and verify the consistency.'
125
+ sentences:
126
+ - What benefits does the F1 @ K metric bring to the verification process in FacTool,
127
+ and what obstacles could it encounter when used for code creation or evaluating
128
+ scientific texts
129
+ - In the context of generating language models, how do direct and indirect queries
130
+ influence the reliability of checking for made-up references? Can you outline
131
+ the advantages and potential drawbacks of each approach
132
+ - In what ways might applying limited examples within the context of prompting improve
133
+ the precision of factual information when generating models with RECITE
134
+ - source_sentence: 'Verbalized number or word (e.g. “lowest”, “low”, “medium”, “high”,
135
+ “highest”), such as "Confidence: 60% / Medium".
136
+
137
+ Normalized logprob of answer tokens; Note that this one is not used in the fine-tuning
138
+ experiment.
139
+
140
+ Logprob of an indirect "True/False" token after the raw answer.
141
+
142
+ Their experiments focused on how well calibration generalizes under distribution
143
+ shifts in task difficulty or content. Each fine-tuning datapoint is a question,
144
+ the model’s answer (possibly incorrect), and a calibrated confidence. Verbalized
145
+ probability generalizes well to both cases, while all setups are doing well on
146
+ multiply-divide task shift. Few-shot is weaker than fine-tuned models on how
147
+ well the confidence is predicted by the model. It is helpful to include more examples
148
+ and 50-shot is almost as good as a fine-tuned version.'
149
+ sentences:
150
+ - Considering the recent finding that larger models are more effective at minimizing
151
+ hallucinations, how might this influence the development and refinement of techniques
152
+ aimed at preventing hallucinations in AI systems
153
+ - In the context of evaluating the consistency of SelfCheckGPT, how does the implementation
154
+ of prompting techniques compare with the efficacy of BERTScore and Natural Language
155
+ Inference (NLI) metrics
156
+ - In the context of few-shot learning, how do the confidence score calibrations
157
+ compare to those of fine-tuned models, particularly when facing changes in data
158
+ distribution
159
+ model-index:
160
+ - name: BGE base Financial Matryoshka
161
+ results:
162
+ - task:
163
+ type: information-retrieval
164
+ name: Information Retrieval
165
+ dataset:
166
+ name: dim 768
167
+ type: dim_768
168
+ metrics:
169
+ - type: cosine_accuracy@1
170
+ value: 0.9207920792079208
171
+ name: Cosine Accuracy@1
172
+ - type: cosine_accuracy@3
173
+ value: 0.995049504950495
174
+ name: Cosine Accuracy@3
175
+ - type: cosine_accuracy@5
176
+ value: 0.995049504950495
177
+ name: Cosine Accuracy@5
178
+ - type: cosine_accuracy@10
179
+ value: 1.0
180
+ name: Cosine Accuracy@10
181
+ - type: cosine_precision@1
182
+ value: 0.9207920792079208
183
+ name: Cosine Precision@1
184
+ - type: cosine_precision@3
185
+ value: 0.3316831683168317
186
+ name: Cosine Precision@3
187
+ - type: cosine_precision@5
188
+ value: 0.19900990099009902
189
+ name: Cosine Precision@5
190
+ - type: cosine_precision@10
191
+ value: 0.09999999999999999
192
+ name: Cosine Precision@10
193
+ - type: cosine_recall@1
194
+ value: 0.9207920792079208
195
+ name: Cosine Recall@1
196
+ - type: cosine_recall@3
197
+ value: 0.995049504950495
198
+ name: Cosine Recall@3
199
+ - type: cosine_recall@5
200
+ value: 0.995049504950495
201
+ name: Cosine Recall@5
202
+ - type: cosine_recall@10
203
+ value: 1.0
204
+ name: Cosine Recall@10
205
+ - type: cosine_ndcg@10
206
+ value: 0.9694067004489104
207
+ name: Cosine Ndcg@10
208
+ - type: cosine_mrr@10
209
+ value: 0.9587458745874589
210
+ name: Cosine Mrr@10
211
+ - type: cosine_map@100
212
+ value: 0.9587458745874587
213
+ name: Cosine Map@100
214
+ - task:
215
+ type: information-retrieval
216
+ name: Information Retrieval
217
+ dataset:
218
+ name: dim 512
219
+ type: dim_512
220
+ metrics:
221
+ - type: cosine_accuracy@1
222
+ value: 0.9257425742574258
223
+ name: Cosine Accuracy@1
224
+ - type: cosine_accuracy@3
225
+ value: 0.995049504950495
226
+ name: Cosine Accuracy@3
227
+ - type: cosine_accuracy@5
228
+ value: 1.0
229
+ name: Cosine Accuracy@5
230
+ - type: cosine_accuracy@10
231
+ value: 1.0
232
+ name: Cosine Accuracy@10
233
+ - type: cosine_precision@1
234
+ value: 0.9257425742574258
235
+ name: Cosine Precision@1
236
+ - type: cosine_precision@3
237
+ value: 0.3316831683168317
238
+ name: Cosine Precision@3
239
+ - type: cosine_precision@5
240
+ value: 0.19999999999999998
241
+ name: Cosine Precision@5
242
+ - type: cosine_precision@10
243
+ value: 0.09999999999999999
244
+ name: Cosine Precision@10
245
+ - type: cosine_recall@1
246
+ value: 0.9257425742574258
247
+ name: Cosine Recall@1
248
+ - type: cosine_recall@3
249
+ value: 0.995049504950495
250
+ name: Cosine Recall@3
251
+ - type: cosine_recall@5
252
+ value: 1.0
253
+ name: Cosine Recall@5
254
+ - type: cosine_recall@10
255
+ value: 1.0
256
+ name: Cosine Recall@10
257
+ - type: cosine_ndcg@10
258
+ value: 0.9716024411290783
259
+ name: Cosine Ndcg@10
260
+ - type: cosine_mrr@10
261
+ value: 0.9616336633663366
262
+ name: Cosine Mrr@10
263
+ - type: cosine_map@100
264
+ value: 0.9616336633663366
265
+ name: Cosine Map@100
266
+ - task:
267
+ type: information-retrieval
268
+ name: Information Retrieval
269
+ dataset:
270
+ name: dim 256
271
+ type: dim_256
272
+ metrics:
273
+ - type: cosine_accuracy@1
274
+ value: 0.9158415841584159
275
+ name: Cosine Accuracy@1
276
+ - type: cosine_accuracy@3
277
+ value: 1.0
278
+ name: Cosine Accuracy@3
279
+ - type: cosine_accuracy@5
280
+ value: 1.0
281
+ name: Cosine Accuracy@5
282
+ - type: cosine_accuracy@10
283
+ value: 1.0
284
+ name: Cosine Accuracy@10
285
+ - type: cosine_precision@1
286
+ value: 0.9158415841584159
287
+ name: Cosine Precision@1
288
+ - type: cosine_precision@3
289
+ value: 0.33333333333333337
290
+ name: Cosine Precision@3
291
+ - type: cosine_precision@5
292
+ value: 0.19999999999999998
293
+ name: Cosine Precision@5
294
+ - type: cosine_precision@10
295
+ value: 0.09999999999999999
296
+ name: Cosine Precision@10
297
+ - type: cosine_recall@1
298
+ value: 0.9158415841584159
299
+ name: Cosine Recall@1
300
+ - type: cosine_recall@3
301
+ value: 1.0
302
+ name: Cosine Recall@3
303
+ - type: cosine_recall@5
304
+ value: 1.0
305
+ name: Cosine Recall@5
306
+ - type: cosine_recall@10
307
+ value: 1.0
308
+ name: Cosine Recall@10
309
+ - type: cosine_ndcg@10
310
+ value: 0.9676432985325341
311
+ name: Cosine Ndcg@10
312
+ - type: cosine_mrr@10
313
+ value: 0.9562706270627063
314
+ name: Cosine Mrr@10
315
+ - type: cosine_map@100
316
+ value: 0.9562706270627064
317
+ name: Cosine Map@100
318
+ - task:
319
+ type: information-retrieval
320
+ name: Information Retrieval
321
+ dataset:
322
+ name: dim 128
323
+ type: dim_128
324
+ metrics:
325
+ - type: cosine_accuracy@1
326
+ value: 0.9158415841584159
327
+ name: Cosine Accuracy@1
328
+ - type: cosine_accuracy@3
329
+ value: 0.995049504950495
330
+ name: Cosine Accuracy@3
331
+ - type: cosine_accuracy@5
332
+ value: 1.0
333
+ name: Cosine Accuracy@5
334
+ - type: cosine_accuracy@10
335
+ value: 1.0
336
+ name: Cosine Accuracy@10
337
+ - type: cosine_precision@1
338
+ value: 0.9158415841584159
339
+ name: Cosine Precision@1
340
+ - type: cosine_precision@3
341
+ value: 0.3316831683168317
342
+ name: Cosine Precision@3
343
+ - type: cosine_precision@5
344
+ value: 0.19999999999999998
345
+ name: Cosine Precision@5
346
+ - type: cosine_precision@10
347
+ value: 0.09999999999999999
348
+ name: Cosine Precision@10
349
+ - type: cosine_recall@1
350
+ value: 0.9158415841584159
351
+ name: Cosine Recall@1
352
+ - type: cosine_recall@3
353
+ value: 0.995049504950495
354
+ name: Cosine Recall@3
355
+ - type: cosine_recall@5
356
+ value: 1.0
357
+ name: Cosine Recall@5
358
+ - type: cosine_recall@10
359
+ value: 1.0
360
+ name: Cosine Recall@10
361
+ - type: cosine_ndcg@10
362
+ value: 0.9677313310117717
363
+ name: Cosine Ndcg@10
364
+ - type: cosine_mrr@10
365
+ value: 0.9564356435643564
366
+ name: Cosine Mrr@10
367
+ - type: cosine_map@100
368
+ value: 0.9564356435643564
369
+ name: Cosine Map@100
370
+ - task:
371
+ type: information-retrieval
372
+ name: Information Retrieval
373
+ dataset:
374
+ name: dim 64
375
+ type: dim_64
376
+ metrics:
377
+ - type: cosine_accuracy@1
378
+ value: 0.900990099009901
379
+ name: Cosine Accuracy@1
380
+ - type: cosine_accuracy@3
381
+ value: 1.0
382
+ name: Cosine Accuracy@3
383
+ - type: cosine_accuracy@5
384
+ value: 1.0
385
+ name: Cosine Accuracy@5
386
+ - type: cosine_accuracy@10
387
+ value: 1.0
388
+ name: Cosine Accuracy@10
389
+ - type: cosine_precision@1
390
+ value: 0.900990099009901
391
+ name: Cosine Precision@1
392
+ - type: cosine_precision@3
393
+ value: 0.33333333333333337
394
+ name: Cosine Precision@3
395
+ - type: cosine_precision@5
396
+ value: 0.19999999999999998
397
+ name: Cosine Precision@5
398
+ - type: cosine_precision@10
399
+ value: 0.09999999999999999
400
+ name: Cosine Precision@10
401
+ - type: cosine_recall@1
402
+ value: 0.900990099009901
403
+ name: Cosine Recall@1
404
+ - type: cosine_recall@3
405
+ value: 1.0
406
+ name: Cosine Recall@3
407
+ - type: cosine_recall@5
408
+ value: 1.0
409
+ name: Cosine Recall@5
410
+ - type: cosine_recall@10
411
+ value: 1.0
412
+ name: Cosine Recall@10
413
+ - type: cosine_ndcg@10
414
+ value: 0.9621620572489419
415
+ name: Cosine Ndcg@10
416
+ - type: cosine_mrr@10
417
+ value: 0.9488448844884488
418
+ name: Cosine Mrr@10
419
+ - type: cosine_map@100
420
+ value: 0.948844884488449
421
+ name: Cosine Map@100
422
+ ---
423
+
424
+ # BGE base Financial Matryoshka
425
+
426
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
427
+
428
+ ## Model Details
429
+
430
+ ### Model Description
431
+ - **Model Type:** Sentence Transformer
432
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
433
+ - **Maximum Sequence Length:** 512 tokens
434
+ - **Output Dimensionality:** 768 tokens
435
+ - **Similarity Function:** Cosine Similarity
436
+ <!-- - **Training Dataset:** Unknown -->
437
+ - **Language:** en
438
+ - **License:** apache-2.0
439
+
440
+ ### Model Sources
441
+
442
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
443
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
444
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
445
+
446
+ ### Full Model Architecture
447
+
448
+ ```
449
+ SentenceTransformer(
450
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
451
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
452
+ (2): Normalize()
453
+ )
454
+ ```
455
+
456
+ ## Usage
457
+
458
+ ### Direct Usage (Sentence Transformers)
459
+
460
+ First install the Sentence Transformers library:
461
+
462
+ ```bash
463
+ pip install -U sentence-transformers
464
+ ```
465
+
466
+ Then you can load this model and run inference.
467
+ ```python
468
+ from sentence_transformers import SentenceTransformer
469
+
470
+ # Download from the 🤗 Hub
471
+ model = SentenceTransformer("joshuapb/fine-tuned-matryoshka")
472
+ # Run inference
473
+ sentences = [
474
+ 'Verbalized number or word (e.g. “lowest”, “low”, “medium”, “high”, “highest”), such as "Confidence: 60% / Medium".\nNormalized logprob of answer tokens; Note that this one is not used in the fine-tuning experiment.\nLogprob of an indirect "True/False" token after the raw answer.\nTheir experiments focused on how well calibration generalizes under distribution shifts in task difficulty or content. Each fine-tuning datapoint is a question, the model’s answer (possibly incorrect), and a calibrated confidence. Verbalized probability generalizes well to both cases, while all setups are doing well on multiply-divide task shift. Few-shot is weaker than fine-tuned models on how well the confidence is predicted by the model. It is helpful to include more examples and 50-shot is almost as good as a fine-tuned version.',
475
+ 'In the context of few-shot learning, how do the confidence score calibrations compare to those of fine-tuned models, particularly when facing changes in data distribution',
476
+ 'Considering the recent finding that larger models are more effective at minimizing hallucinations, how might this influence the development and refinement of techniques aimed at preventing hallucinations in AI systems',
477
+ ]
478
+ embeddings = model.encode(sentences)
479
+ print(embeddings.shape)
480
+ # [3, 768]
481
+
482
+ # Get the similarity scores for the embeddings
483
+ similarities = model.similarity(embeddings, embeddings)
484
+ print(similarities.shape)
485
+ # [3, 3]
486
+ ```
487
+
488
+ <!--
489
+ ### Direct Usage (Transformers)
490
+
491
+ <details><summary>Click to see the direct usage in Transformers</summary>
492
+
493
+ </details>
494
+ -->
495
+
496
+ <!--
497
+ ### Downstream Usage (Sentence Transformers)
498
+
499
+ You can finetune this model on your own dataset.
500
+
501
+ <details><summary>Click to expand</summary>
502
+
503
+ </details>
504
+ -->
505
+
506
+ <!--
507
+ ### Out-of-Scope Use
508
+
509
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
510
+ -->
511
+
512
+ ## Evaluation
513
+
514
+ ### Metrics
515
+
516
+ #### Information Retrieval
517
+ * Dataset: `dim_768`
518
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
519
+
520
+ | Metric | Value |
521
+ |:--------------------|:-----------|
522
+ | cosine_accuracy@1 | 0.9208 |
523
+ | cosine_accuracy@3 | 0.995 |
524
+ | cosine_accuracy@5 | 0.995 |
525
+ | cosine_accuracy@10 | 1.0 |
526
+ | cosine_precision@1 | 0.9208 |
527
+ | cosine_precision@3 | 0.3317 |
528
+ | cosine_precision@5 | 0.199 |
529
+ | cosine_precision@10 | 0.1 |
530
+ | cosine_recall@1 | 0.9208 |
531
+ | cosine_recall@3 | 0.995 |
532
+ | cosine_recall@5 | 0.995 |
533
+ | cosine_recall@10 | 1.0 |
534
+ | cosine_ndcg@10 | 0.9694 |
535
+ | cosine_mrr@10 | 0.9587 |
536
+ | **cosine_map@100** | **0.9587** |
537
+
538
+ #### Information Retrieval
539
+ * Dataset: `dim_512`
540
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
541
+
542
+ | Metric | Value |
543
+ |:--------------------|:-----------|
544
+ | cosine_accuracy@1 | 0.9257 |
545
+ | cosine_accuracy@3 | 0.995 |
546
+ | cosine_accuracy@5 | 1.0 |
547
+ | cosine_accuracy@10 | 1.0 |
548
+ | cosine_precision@1 | 0.9257 |
549
+ | cosine_precision@3 | 0.3317 |
550
+ | cosine_precision@5 | 0.2 |
551
+ | cosine_precision@10 | 0.1 |
552
+ | cosine_recall@1 | 0.9257 |
553
+ | cosine_recall@3 | 0.995 |
554
+ | cosine_recall@5 | 1.0 |
555
+ | cosine_recall@10 | 1.0 |
556
+ | cosine_ndcg@10 | 0.9716 |
557
+ | cosine_mrr@10 | 0.9616 |
558
+ | **cosine_map@100** | **0.9616** |
559
+
560
+ #### Information Retrieval
561
+ * Dataset: `dim_256`
562
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
563
+
564
+ | Metric | Value |
565
+ |:--------------------|:-----------|
566
+ | cosine_accuracy@1 | 0.9158 |
567
+ | cosine_accuracy@3 | 1.0 |
568
+ | cosine_accuracy@5 | 1.0 |
569
+ | cosine_accuracy@10 | 1.0 |
570
+ | cosine_precision@1 | 0.9158 |
571
+ | cosine_precision@3 | 0.3333 |
572
+ | cosine_precision@5 | 0.2 |
573
+ | cosine_precision@10 | 0.1 |
574
+ | cosine_recall@1 | 0.9158 |
575
+ | cosine_recall@3 | 1.0 |
576
+ | cosine_recall@5 | 1.0 |
577
+ | cosine_recall@10 | 1.0 |
578
+ | cosine_ndcg@10 | 0.9676 |
579
+ | cosine_mrr@10 | 0.9563 |
580
+ | **cosine_map@100** | **0.9563** |
581
+
582
+ #### Information Retrieval
583
+ * Dataset: `dim_128`
584
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
585
+
586
+ | Metric | Value |
587
+ |:--------------------|:-----------|
588
+ | cosine_accuracy@1 | 0.9158 |
589
+ | cosine_accuracy@3 | 0.995 |
590
+ | cosine_accuracy@5 | 1.0 |
591
+ | cosine_accuracy@10 | 1.0 |
592
+ | cosine_precision@1 | 0.9158 |
593
+ | cosine_precision@3 | 0.3317 |
594
+ | cosine_precision@5 | 0.2 |
595
+ | cosine_precision@10 | 0.1 |
596
+ | cosine_recall@1 | 0.9158 |
597
+ | cosine_recall@3 | 0.995 |
598
+ | cosine_recall@5 | 1.0 |
599
+ | cosine_recall@10 | 1.0 |
600
+ | cosine_ndcg@10 | 0.9677 |
601
+ | cosine_mrr@10 | 0.9564 |
602
+ | **cosine_map@100** | **0.9564** |
603
+
604
+ #### Information Retrieval
605
+ * Dataset: `dim_64`
606
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
607
+
608
+ | Metric | Value |
609
+ |:--------------------|:-----------|
610
+ | cosine_accuracy@1 | 0.901 |
611
+ | cosine_accuracy@3 | 1.0 |
612
+ | cosine_accuracy@5 | 1.0 |
613
+ | cosine_accuracy@10 | 1.0 |
614
+ | cosine_precision@1 | 0.901 |
615
+ | cosine_precision@3 | 0.3333 |
616
+ | cosine_precision@5 | 0.2 |
617
+ | cosine_precision@10 | 0.1 |
618
+ | cosine_recall@1 | 0.901 |
619
+ | cosine_recall@3 | 1.0 |
620
+ | cosine_recall@5 | 1.0 |
621
+ | cosine_recall@10 | 1.0 |
622
+ | cosine_ndcg@10 | 0.9622 |
623
+ | cosine_mrr@10 | 0.9488 |
624
+ | **cosine_map@100** | **0.9488** |
625
+
626
+ <!--
627
+ ## Bias, Risks and Limitations
628
+
629
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
630
+ -->
631
+
632
+ <!--
633
+ ### Recommendations
634
+
635
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
636
+ -->
637
+
638
+ ## Training Details
639
+
640
+ ### Training Hyperparameters
641
+ #### Non-Default Hyperparameters
642
+
643
+ - `eval_strategy`: epoch
644
+ - `per_device_eval_batch_size`: 16
645
+ - `learning_rate`: 2e-05
646
+ - `num_train_epochs`: 5
647
+ - `lr_scheduler_type`: cosine
648
+ - `warmup_ratio`: 0.1
649
+ - `load_best_model_at_end`: True
650
+
651
+ #### All Hyperparameters
652
+ <details><summary>Click to expand</summary>
653
+
654
+ - `overwrite_output_dir`: False
655
+ - `do_predict`: False
656
+ - `eval_strategy`: epoch
657
+ - `prediction_loss_only`: True
658
+ - `per_device_train_batch_size`: 8
659
+ - `per_device_eval_batch_size`: 16
660
+ - `per_gpu_train_batch_size`: None
661
+ - `per_gpu_eval_batch_size`: None
662
+ - `gradient_accumulation_steps`: 1
663
+ - `eval_accumulation_steps`: None
664
+ - `learning_rate`: 2e-05
665
+ - `weight_decay`: 0.0
666
+ - `adam_beta1`: 0.9
667
+ - `adam_beta2`: 0.999
668
+ - `adam_epsilon`: 1e-08
669
+ - `max_grad_norm`: 1.0
670
+ - `num_train_epochs`: 5
671
+ - `max_steps`: -1
672
+ - `lr_scheduler_type`: cosine
673
+ - `lr_scheduler_kwargs`: {}
674
+ - `warmup_ratio`: 0.1
675
+ - `warmup_steps`: 0
676
+ - `log_level`: passive
677
+ - `log_level_replica`: warning
678
+ - `log_on_each_node`: True
679
+ - `logging_nan_inf_filter`: True
680
+ - `save_safetensors`: True
681
+ - `save_on_each_node`: False
682
+ - `save_only_model`: False
683
+ - `restore_callback_states_from_checkpoint`: False
684
+ - `no_cuda`: False
685
+ - `use_cpu`: False
686
+ - `use_mps_device`: False
687
+ - `seed`: 42
688
+ - `data_seed`: None
689
+ - `jit_mode_eval`: False
690
+ - `use_ipex`: False
691
+ - `bf16`: False
692
+ - `fp16`: False
693
+ - `fp16_opt_level`: O1
694
+ - `half_precision_backend`: auto
695
+ - `bf16_full_eval`: False
696
+ - `fp16_full_eval`: False
697
+ - `tf32`: None
698
+ - `local_rank`: 0
699
+ - `ddp_backend`: None
700
+ - `tpu_num_cores`: None
701
+ - `tpu_metrics_debug`: False
702
+ - `debug`: []
703
+ - `dataloader_drop_last`: False
704
+ - `dataloader_num_workers`: 0
705
+ - `dataloader_prefetch_factor`: None
706
+ - `past_index`: -1
707
+ - `disable_tqdm`: False
708
+ - `remove_unused_columns`: True
709
+ - `label_names`: None
710
+ - `load_best_model_at_end`: True
711
+ - `ignore_data_skip`: False
712
+ - `fsdp`: []
713
+ - `fsdp_min_num_params`: 0
714
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
715
+ - `fsdp_transformer_layer_cls_to_wrap`: None
716
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
717
+ - `deepspeed`: None
718
+ - `label_smoothing_factor`: 0.0
719
+ - `optim`: adamw_torch
720
+ - `optim_args`: None
721
+ - `adafactor`: False
722
+ - `group_by_length`: False
723
+ - `length_column_name`: length
724
+ - `ddp_find_unused_parameters`: None
725
+ - `ddp_bucket_cap_mb`: None
726
+ - `ddp_broadcast_buffers`: False
727
+ - `dataloader_pin_memory`: True
728
+ - `dataloader_persistent_workers`: False
729
+ - `skip_memory_metrics`: True
730
+ - `use_legacy_prediction_loop`: False
731
+ - `push_to_hub`: False
732
+ - `resume_from_checkpoint`: None
733
+ - `hub_model_id`: None
734
+ - `hub_strategy`: every_save
735
+ - `hub_private_repo`: False
736
+ - `hub_always_push`: False
737
+ - `gradient_checkpointing`: False
738
+ - `gradient_checkpointing_kwargs`: None
739
+ - `include_inputs_for_metrics`: False
740
+ - `eval_do_concat_batches`: True
741
+ - `fp16_backend`: auto
742
+ - `push_to_hub_model_id`: None
743
+ - `push_to_hub_organization`: None
744
+ - `mp_parameters`:
745
+ - `auto_find_batch_size`: False
746
+ - `full_determinism`: False
747
+ - `torchdynamo`: None
748
+ - `ray_scope`: last
749
+ - `ddp_timeout`: 1800
750
+ - `torch_compile`: False
751
+ - `torch_compile_backend`: None
752
+ - `torch_compile_mode`: None
753
+ - `dispatch_batches`: None
754
+ - `split_batches`: None
755
+ - `include_tokens_per_second`: False
756
+ - `include_num_input_tokens_seen`: False
757
+ - `neftune_noise_alpha`: None
758
+ - `optim_target_modules`: None
759
+ - `batch_eval_metrics`: False
760
+ - `eval_on_start`: False
761
+ - `batch_sampler`: batch_sampler
762
+ - `multi_dataset_batch_sampler`: proportional
763
+
764
+ </details>
765
+
766
+ ### Training Logs
767
+ <details><summary>Click to expand</summary>
768
+
769
+ | Epoch | Step | Training Loss | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_512_cosine_map@100 | dim_64_cosine_map@100 | dim_768_cosine_map@100 |
770
+ |:-------:|:--------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|:----------------------:|
771
+ | 0.0220 | 5 | 6.6173 | - | - | - | - | - |
772
+ | 0.0441 | 10 | 5.5321 | - | - | - | - | - |
773
+ | 0.0661 | 15 | 5.656 | - | - | - | - | - |
774
+ | 0.0881 | 20 | 4.9256 | - | - | - | - | - |
775
+ | 0.1101 | 25 | 5.0757 | - | - | - | - | - |
776
+ | 0.1322 | 30 | 5.2047 | - | - | - | - | - |
777
+ | 0.1542 | 35 | 5.1307 | - | - | - | - | - |
778
+ | 0.1762 | 40 | 4.9219 | - | - | - | - | - |
779
+ | 0.1982 | 45 | 5.1957 | - | - | - | - | - |
780
+ | 0.2203 | 50 | 5.36 | - | - | - | - | - |
781
+ | 0.2423 | 55 | 3.0865 | - | - | - | - | - |
782
+ | 0.2643 | 60 | 3.7054 | - | - | - | - | - |
783
+ | 0.2863 | 65 | 2.9541 | - | - | - | - | - |
784
+ | 0.3084 | 70 | 3.5521 | - | - | - | - | - |
785
+ | 0.3304 | 75 | 3.5665 | - | - | - | - | - |
786
+ | 0.3524 | 80 | 2.9532 | - | - | - | - | - |
787
+ | 0.3744 | 85 | 2.5121 | - | - | - | - | - |
788
+ | 0.3965 | 90 | 3.1269 | - | - | - | - | - |
789
+ | 0.4185 | 95 | 3.4048 | - | - | - | - | - |
790
+ | 0.4405 | 100 | 2.8126 | - | - | - | - | - |
791
+ | 0.4626 | 105 | 1.6847 | - | - | - | - | - |
792
+ | 0.4846 | 110 | 1.3331 | - | - | - | - | - |
793
+ | 0.5066 | 115 | 2.4799 | - | - | - | - | - |
794
+ | 0.5286 | 120 | 2.1176 | - | - | - | - | - |
795
+ | 0.5507 | 125 | 2.4249 | - | - | - | - | - |
796
+ | 0.5727 | 130 | 3.3705 | - | - | - | - | - |
797
+ | 0.5947 | 135 | 1.551 | - | - | - | - | - |
798
+ | 0.6167 | 140 | 1.328 | - | - | - | - | - |
799
+ | 0.6388 | 145 | 1.9353 | - | - | - | - | - |
800
+ | 0.6608 | 150 | 2.4254 | - | - | - | - | - |
801
+ | 0.6828 | 155 | 1.8436 | - | - | - | - | - |
802
+ | 0.7048 | 160 | 1.1937 | - | - | - | - | - |
803
+ | 0.7269 | 165 | 2.164 | - | - | - | - | - |
804
+ | 0.7489 | 170 | 2.2921 | - | - | - | - | - |
805
+ | 0.7709 | 175 | 2.4385 | - | - | - | - | - |
806
+ | 0.7930 | 180 | 1.2392 | - | - | - | - | - |
807
+ | 0.8150 | 185 | 1.0472 | - | - | - | - | - |
808
+ | 0.8370 | 190 | 1.5844 | - | - | - | - | - |
809
+ | 0.8590 | 195 | 1.2492 | - | - | - | - | - |
810
+ | 0.8811 | 200 | 1.6774 | - | - | - | - | - |
811
+ | 0.9031 | 205 | 2.485 | - | - | - | - | - |
812
+ | 0.9251 | 210 | 2.4781 | - | - | - | - | - |
813
+ | 0.9471 | 215 | 2.4476 | - | - | - | - | - |
814
+ | 0.9692 | 220 | 2.6243 | - | - | - | - | - |
815
+ | 0.9912 | 225 | 1.3651 | - | - | - | - | - |
816
+ | 1.0 | 227 | - | 0.9066 | 0.9112 | 0.9257 | 0.8906 | 0.9182 |
817
+ | 1.0132 | 230 | 1.0575 | - | - | - | - | - |
818
+ | 1.0352 | 235 | 1.4499 | - | - | - | - | - |
819
+ | 1.0573 | 240 | 1.4333 | - | - | - | - | - |
820
+ | 1.0793 | 245 | 1.1148 | - | - | - | - | - |
821
+ | 1.1013 | 250 | 1.259 | - | - | - | - | - |
822
+ | 1.1233 | 255 | 0.873 | - | - | - | - | - |
823
+ | 1.1454 | 260 | 1.646 | - | - | - | - | - |
824
+ | 1.1674 | 265 | 1.7583 | - | - | - | - | - |
825
+ | 1.1894 | 270 | 1.2268 | - | - | - | - | - |
826
+ | 1.2115 | 275 | 1.3792 | - | - | - | - | - |
827
+ | 1.2335 | 280 | 2.5662 | - | - | - | - | - |
828
+ | 1.2555 | 285 | 1.5021 | - | - | - | - | - |
829
+ | 1.2775 | 290 | 1.1399 | - | - | - | - | - |
830
+ | 1.2996 | 295 | 1.3307 | - | - | - | - | - |
831
+ | 1.3216 | 300 | 0.7458 | - | - | - | - | - |
832
+ | 1.3436 | 305 | 1.1029 | - | - | - | - | - |
833
+ | 1.3656 | 310 | 1.0205 | - | - | - | - | - |
834
+ | 1.3877 | 315 | 1.0998 | - | - | - | - | - |
835
+ | 1.4097 | 320 | 0.8304 | - | - | - | - | - |
836
+ | 1.4317 | 325 | 1.3673 | - | - | - | - | - |
837
+ | 1.4537 | 330 | 2.4445 | - | - | - | - | - |
838
+ | 1.4758 | 335 | 2.8757 | - | - | - | - | - |
839
+ | 1.4978 | 340 | 1.7879 | - | - | - | - | - |
840
+ | 1.5198 | 345 | 1.1255 | - | - | - | - | - |
841
+ | 1.5419 | 350 | 1.6743 | - | - | - | - | - |
842
+ | 1.5639 | 355 | 1.3803 | - | - | - | - | - |
843
+ | 1.5859 | 360 | 1.1998 | - | - | - | - | - |
844
+ | 1.6079 | 365 | 1.2129 | - | - | - | - | - |
845
+ | 1.6300 | 370 | 1.6588 | - | - | - | - | - |
846
+ | 1.6520 | 375 | 0.9827 | - | - | - | - | - |
847
+ | 1.6740 | 380 | 0.605 | - | - | - | - | - |
848
+ | 1.6960 | 385 | 1.2934 | - | - | - | - | - |
849
+ | 1.7181 | 390 | 1.1776 | - | - | - | - | - |
850
+ | 1.7401 | 395 | 1.445 | - | - | - | - | - |
851
+ | 1.7621 | 400 | 0.6393 | - | - | - | - | - |
852
+ | 1.7841 | 405 | 0.9303 | - | - | - | - | - |
853
+ | 1.8062 | 410 | 0.7541 | - | - | - | - | - |
854
+ | 1.8282 | 415 | 0.5413 | - | - | - | - | - |
855
+ | 1.8502 | 420 | 1.5258 | - | - | - | - | - |
856
+ | 1.8722 | 425 | 1.4257 | - | - | - | - | - |
857
+ | 1.8943 | 430 | 1.3111 | - | - | - | - | - |
858
+ | 1.9163 | 435 | 1.6604 | - | - | - | - | - |
859
+ | 1.9383 | 440 | 1.4004 | - | - | - | - | - |
860
+ | 1.9604 | 445 | 2.7186 | - | - | - | - | - |
861
+ | 1.9824 | 450 | 2.2757 | - | - | - | - | - |
862
+ | 2.0 | 454 | - | 0.9401 | 0.9433 | 0.9387 | 0.9386 | 0.9416 |
863
+ | 2.0044 | 455 | 0.9345 | - | - | - | - | - |
864
+ | 2.0264 | 460 | 0.9325 | - | - | - | - | - |
865
+ | 2.0485 | 465 | 1.2434 | - | - | - | - | - |
866
+ | 2.0705 | 470 | 1.5161 | - | - | - | - | - |
867
+ | 2.0925 | 475 | 2.6011 | - | - | - | - | - |
868
+ | 2.1145 | 480 | 1.8276 | - | - | - | - | - |
869
+ | 2.1366 | 485 | 1.5005 | - | - | - | - | - |
870
+ | 2.1586 | 490 | 0.8618 | - | - | - | - | - |
871
+ | 2.1806 | 495 | 2.1422 | - | - | - | - | - |
872
+ | 2.2026 | 500 | 1.3922 | - | - | - | - | - |
873
+ | 2.2247 | 505 | 1.5939 | - | - | - | - | - |
874
+ | 2.2467 | 510 | 1.3021 | - | - | - | - | - |
875
+ | 2.2687 | 515 | 1.0825 | - | - | - | - | - |
876
+ | 2.2907 | 520 | 0.9066 | - | - | - | - | - |
877
+ | 2.3128 | 525 | 0.7717 | - | - | - | - | - |
878
+ | 2.3348 | 530 | 1.1484 | - | - | - | - | - |
879
+ | 2.3568 | 535 | 1.6513 | - | - | - | - | - |
880
+ | 2.3789 | 540 | 1.7267 | - | - | - | - | - |
881
+ | 2.4009 | 545 | 0.7659 | - | - | - | - | - |
882
+ | 2.4229 | 550 | 2.0213 | - | - | - | - | - |
883
+ | 2.4449 | 555 | 0.5329 | - | - | - | - | - |
884
+ | 2.4670 | 560 | 1.2083 | - | - | - | - | - |
885
+ | 2.4890 | 565 | 1.5432 | - | - | - | - | - |
886
+ | 2.5110 | 570 | 0.5423 | - | - | - | - | - |
887
+ | 2.5330 | 575 | 0.2613 | - | - | - | - | - |
888
+ | 2.5551 | 580 | 0.7985 | - | - | - | - | - |
889
+ | 2.5771 | 585 | 0.3003 | - | - | - | - | - |
890
+ | 2.5991 | 590 | 2.2234 | - | - | - | - | - |
891
+ | 2.6211 | 595 | 0.4772 | - | - | - | - | - |
892
+ | 2.6432 | 600 | 1.0158 | - | - | - | - | - |
893
+ | 2.6652 | 605 | 2.6385 | - | - | - | - | - |
894
+ | 2.6872 | 610 | 0.7042 | - | - | - | - | - |
895
+ | 2.7093 | 615 | 1.1469 | - | - | - | - | - |
896
+ | 2.7313 | 620 | 1.4092 | - | - | - | - | - |
897
+ | 2.7533 | 625 | 0.6487 | - | - | - | - | - |
898
+ | 2.7753 | 630 | 1.218 | - | - | - | - | - |
899
+ | 2.7974 | 635 | 1.1509 | - | - | - | - | - |
900
+ | 2.8194 | 640 | 1.1524 | - | - | - | - | - |
901
+ | 2.8414 | 645 | 0.6477 | - | - | - | - | - |
902
+ | 2.8634 | 650 | 0.6295 | - | - | - | - | - |
903
+ | 2.8855 | 655 | 1.3026 | - | - | - | - | - |
904
+ | 2.9075 | 660 | 1.9196 | - | - | - | - | - |
905
+ | 2.9295 | 665 | 1.3743 | - | - | - | - | - |
906
+ | 2.9515 | 670 | 0.8934 | - | - | - | - | - |
907
+ | 2.9736 | 675 | 1.1801 | - | - | - | - | - |
908
+ | 2.9956 | 680 | 1.2952 | - | - | - | - | - |
909
+ | 3.0 | 681 | - | 0.9538 | 0.9513 | 0.9538 | 0.9414 | 0.9435 |
910
+ | 3.0176 | 685 | 0.3324 | - | - | - | - | - |
911
+ | 3.0396 | 690 | 0.9551 | - | - | - | - | - |
912
+ | 3.0617 | 695 | 0.9315 | - | - | - | - | - |
913
+ | 3.0837 | 700 | 1.3611 | - | - | - | - | - |
914
+ | 3.1057 | 705 | 1.4406 | - | - | - | - | - |
915
+ | 3.1278 | 710 | 0.5888 | - | - | - | - | - |
916
+ | 3.1498 | 715 | 0.9149 | - | - | - | - | - |
917
+ | 3.1718 | 720 | 0.5627 | - | - | - | - | - |
918
+ | 3.1938 | 725 | 1.6876 | - | - | - | - | - |
919
+ | 3.2159 | 730 | 1.1366 | - | - | - | - | - |
920
+ | 3.2379 | 735 | 1.3571 | - | - | - | - | - |
921
+ | 3.2599 | 740 | 1.5227 | - | - | - | - | - |
922
+ | 3.2819 | 745 | 2.5139 | - | - | - | - | - |
923
+ | 3.3040 | 750 | 0.3735 | - | - | - | - | - |
924
+ | 3.3260 | 755 | 1.4386 | - | - | - | - | - |
925
+ | 3.3480 | 760 | 0.3838 | - | - | - | - | - |
926
+ | 3.3700 | 765 | 0.3973 | - | - | - | - | - |
927
+ | 3.3921 | 770 | 1.4972 | - | - | - | - | - |
928
+ | 3.4141 | 775 | 1.5118 | - | - | - | - | - |
929
+ | 3.4361 | 780 | 0.478 | - | - | - | - | - |
930
+ | 3.4581 | 785 | 1.5982 | - | - | - | - | - |
931
+ | 3.4802 | 790 | 0.6209 | - | - | - | - | - |
932
+ | 3.5022 | 795 | 0.5902 | - | - | - | - | - |
933
+ | 3.5242 | 800 | 1.0877 | - | - | - | - | - |
934
+ | 3.5463 | 805 | 0.9553 | - | - | - | - | - |
935
+ | 3.5683 | 810 | 0.3054 | - | - | - | - | - |
936
+ | 3.5903 | 815 | 1.2229 | - | - | - | - | - |
937
+ | 3.6123 | 820 | 0.7434 | - | - | - | - | - |
938
+ | 3.6344 | 825 | 1.5447 | - | - | - | - | - |
939
+ | 3.6564 | 830 | 1.0751 | - | - | - | - | - |
940
+ | 3.6784 | 835 | 0.8161 | - | - | - | - | - |
941
+ | 3.7004 | 840 | 0.4382 | - | - | - | - | - |
942
+ | 3.7225 | 845 | 1.3547 | - | - | - | - | - |
943
+ | 3.7445 | 850 | 1.7112 | - | - | - | - | - |
944
+ | 3.7665 | 855 | 0.5362 | - | - | - | - | - |
945
+ | 3.7885 | 860 | 0.9309 | - | - | - | - | - |
946
+ | 3.8106 | 865 | 1.8301 | - | - | - | - | - |
947
+ | 3.8326 | 870 | 1.5554 | - | - | - | - | - |
948
+ | 3.8546 | 875 | 1.4035 | - | - | - | - | - |
949
+ | 3.8767 | 880 | 1.5814 | - | - | - | - | - |
950
+ | 3.8987 | 885 | 0.7283 | - | - | - | - | - |
951
+ | 3.9207 | 890 | 1.8549 | - | - | - | - | - |
952
+ | 3.9427 | 895 | 0.196 | - | - | - | - | - |
953
+ | 3.9648 | 900 | 1.2072 | - | - | - | - | - |
954
+ | 3.9868 | 905 | 0.83 | - | - | - | - | - |
955
+ | 4.0 | 908 | - | 0.9564 | 0.9587 | 0.9612 | 0.9488 | 0.9563 |
956
+ | 4.0088 | 910 | 1.7222 | - | - | - | - | - |
957
+ | 4.0308 | 915 | 0.6728 | - | - | - | - | - |
958
+ | 4.0529 | 920 | 0.9388 | - | - | - | - | - |
959
+ | 4.0749 | 925 | 0.7998 | - | - | - | - | - |
960
+ | 4.0969 | 930 | 1.1561 | - | - | - | - | - |
961
+ | 4.1189 | 935 | 2.4315 | - | - | - | - | - |
962
+ | 4.1410 | 940 | 1.3263 | - | - | - | - | - |
963
+ | 4.1630 | 945 | 1.2374 | - | - | - | - | - |
964
+ | 4.1850 | 950 | 1.1307 | - | - | - | - | - |
965
+ | 4.2070 | 955 | 0.5512 | - | - | - | - | - |
966
+ | 4.2291 | 960 | 1.3266 | - | - | - | - | - |
967
+ | 4.2511 | 965 | 1.2306 | - | - | - | - | - |
968
+ | 4.2731 | 970 | 1.7083 | - | - | - | - | - |
969
+ | 4.2952 | 975 | 0.7028 | - | - | - | - | - |
970
+ | 4.3172 | 980 | 1.2987 | - | - | - | - | - |
971
+ | 4.3392 | 985 | 1.545 | - | - | - | - | - |
972
+ | 4.3612 | 990 | 1.004 | - | - | - | - | - |
973
+ | 4.3833 | 995 | 0.8276 | - | - | - | - | - |
974
+ | 4.4053 | 1000 | 1.4694 | - | - | - | - | - |
975
+ | 4.4273 | 1005 | 0.4914 | - | - | - | - | - |
976
+ | 4.4493 | 1010 | 0.9894 | - | - | - | - | - |
977
+ | 4.4714 | 1015 | 0.8855 | - | - | - | - | - |
978
+ | 4.4934 | 1020 | 1.1339 | - | - | - | - | - |
979
+ | 4.5154 | 1025 | 1.0786 | - | - | - | - | - |
980
+ | 4.5374 | 1030 | 1.2547 | - | - | - | - | - |
981
+ | 4.5595 | 1035 | 0.5312 | - | - | - | - | - |
982
+ | 4.5815 | 1040 | 1.4938 | - | - | - | - | - |
983
+ | 4.6035 | 1045 | 0.8124 | - | - | - | - | - |
984
+ | 4.6256 | 1050 | 1.2401 | - | - | - | - | - |
985
+ | 4.6476 | 1055 | 1.1902 | - | - | - | - | - |
986
+ | 4.6696 | 1060 | 1.4183 | - | - | - | - | - |
987
+ | 4.6916 | 1065 | 1.0718 | - | - | - | - | - |
988
+ | 4.7137 | 1070 | 1.2203 | - | - | - | - | - |
989
+ | 4.7357 | 1075 | 0.8535 | - | - | - | - | - |
990
+ | 4.7577 | 1080 | 1.2454 | - | - | - | - | - |
991
+ | 4.7797 | 1085 | 0.4216 | - | - | - | - | - |
992
+ | 4.8018 | 1090 | 0.8327 | - | - | - | - | - |
993
+ | 4.8238 | 1095 | 1.2371 | - | - | - | - | - |
994
+ | 4.8458 | 1100 | 1.0949 | - | - | - | - | - |
995
+ | 4.8678 | 1105 | 1.2177 | - | - | - | - | - |
996
+ | 4.8899 | 1110 | 0.6236 | - | - | - | - | - |
997
+ | 4.9119 | 1115 | 0.646 | - | - | - | - | - |
998
+ | 4.9339 | 1120 | 1.1822 | - | - | - | - | - |
999
+ | 4.9559 | 1125 | 1.0471 | - | - | - | - | - |
1000
+ | 4.9780 | 1130 | 0.7626 | - | - | - | - | - |
1001
+ | **5.0** | **1135** | **0.9794** | **0.9564** | **0.9563** | **0.9616** | **0.9488** | **0.9587** |
1002
+
1003
+ * The bold row denotes the saved checkpoint.
1004
+ </details>
1005
+
1006
+ ### Framework Versions
1007
+ - Python: 3.10.12
1008
+ - Sentence Transformers: 3.0.1
1009
+ - Transformers: 4.42.4
1010
+ - PyTorch: 2.3.1+cu121
1011
+ - Accelerate: 0.32.1
1012
+ - Datasets: 2.21.0
1013
+ - Tokenizers: 0.19.1
1014
+
1015
+ ## Citation
1016
+
1017
+ ### BibTeX
1018
+
1019
+ #### Sentence Transformers
1020
+ ```bibtex
1021
+ @inproceedings{reimers-2019-sentence-bert,
1022
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
1023
+ author = "Reimers, Nils and Gurevych, Iryna",
1024
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
1025
+ month = "11",
1026
+ year = "2019",
1027
+ publisher = "Association for Computational Linguistics",
1028
+ url = "https://arxiv.org/abs/1908.10084",
1029
+ }
1030
+ ```
1031
+
1032
+ #### MatryoshkaLoss
1033
+ ```bibtex
1034
+ @misc{kusupati2024matryoshka,
1035
+ title={Matryoshka Representation Learning},
1036
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
1037
+ year={2024},
1038
+ eprint={2205.13147},
1039
+ archivePrefix={arXiv},
1040
+ primaryClass={cs.LG}
1041
+ }
1042
+ ```
1043
+
1044
+ #### MultipleNegativesRankingLoss
1045
+ ```bibtex
1046
+ @misc{henderson2017efficient,
1047
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
1048
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
1049
+ year={2017},
1050
+ eprint={1705.00652},
1051
+ archivePrefix={arXiv},
1052
+ primaryClass={cs.CL}
1053
+ }
1054
+ ```
1055
+
1056
+ <!--
1057
+ ## Glossary
1058
+
1059
+ *Clearly define terms in order to be accessible across audiences.*
1060
+ -->
1061
+
1062
+ <!--
1063
+ ## Model Card Authors
1064
+
1065
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
1066
+ -->
1067
+
1068
+ <!--
1069
+ ## Model Card Contact
1070
+
1071
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
1072
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "fine-tuned-matryoshka",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.42.4",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.42.4",
5
+ "pytorch": "2.3.1+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8ebe66a62d30ee433880ae344e1dfd636bd6bb1de801de5df99168dbc62217db
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "max_length": 512,
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_to_multiple_of": null,
53
+ "pad_token": "[PAD]",
54
+ "pad_token_type_id": 0,
55
+ "padding_side": "right",
56
+ "sep_token": "[SEP]",
57
+ "stride": 0,
58
+ "strip_accents": null,
59
+ "tokenize_chinese_chars": true,
60
+ "tokenizer_class": "BertTokenizer",
61
+ "truncation_side": "right",
62
+ "truncation_strategy": "longest_first",
63
+ "unk_token": "[UNK]"
64
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff