Naruke commited on
Commit
a33ba98
1 Parent(s): 6a3efa3

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,801 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: BAAI/bge-base-en-v1.5
3
+ datasets: []
4
+ language:
5
+ - en
6
+ library_name: sentence-transformers
7
+ license: apache-2.0
8
+ metrics:
9
+ - cosine_accuracy@1
10
+ - cosine_accuracy@3
11
+ - cosine_accuracy@5
12
+ - cosine_accuracy@10
13
+ - cosine_precision@1
14
+ - cosine_precision@3
15
+ - cosine_precision@5
16
+ - cosine_precision@10
17
+ - cosine_recall@1
18
+ - cosine_recall@3
19
+ - cosine_recall@5
20
+ - cosine_recall@10
21
+ - cosine_ndcg@10
22
+ - cosine_mrr@10
23
+ - cosine_map@100
24
+ pipeline_tag: sentence-similarity
25
+ tags:
26
+ - sentence-transformers
27
+ - sentence-similarity
28
+ - feature-extraction
29
+ - generated_from_trainer
30
+ - dataset_size:6300
31
+ - loss:MatryoshkaLoss
32
+ - loss:MultipleNegativesRankingLoss
33
+ widget:
34
+ - source_sentence: Interest expense increased nominally by 1% from $935 million in
35
+ 2022 to $944 million in 2023, and the change reflected only a small adjustment
36
+ in the financial operations.
37
+ sentences:
38
+ - What recent technological advancements has the company implemented in set-top
39
+ box (STB) solutions?
40
+ - How much did the interest expense change from 2022 to 2023?
41
+ - What are the conditions under which AENB is restricted from making dividend distributions
42
+ to TRS without OCC approval?
43
+ - source_sentence: Our products are sold in approximately 105 countries.
44
+ sentences:
45
+ - How much were the costs related to the January 2023 restructuring plan?
46
+ - In how many countries are Eli Lilly and Company's products sold?
47
+ - What led to the 74.3% decrease in total net revenues for the Corporate and Other
48
+ segment in fiscal 2023 compared to fiscal 2022?
49
+ - source_sentence: Item 8 is numbered as 39 in the document.
50
+ sentences:
51
+ - What number is associated with Item 8 in the document?
52
+ - What was the total amount of fixed lease payment obligations as of December 31,
53
+ 2023?
54
+ - By how much would a 25 basis point increase in the expected rate of return on
55
+ assets (ROA) affect the 2024 Pension Expense for U.S. plans?
56
+ - source_sentence: The Intelligent Edge business segment under the Aruba brand includes
57
+ a portfolio of solutions for secure edge-to-cloud connectivity, embracing work
58
+ from anywhere environments, mobility, and IoT device connectivity.
59
+ sentences:
60
+ - What types of wireless services does AT&T provide in Mexico?
61
+ - What was the approximate amount of civil penalties agreed upon in the consent
62
+ agreement with the EPA in November 2023?
63
+ - What is the focus of HPE's Intelligent Edge business segment?
64
+ - source_sentence: As part of our solar energy system and energy storage contracts,
65
+ we may provide the customer with performance guarantees that commit that the underlying
66
+ system will meet or exceed the minimum energy generation or performance requirements
67
+ specified in the contract.
68
+ sentences:
69
+ - What types of guarantees does Tesla provide to its solar and energy storage customers?
70
+ - How many full-time employees did Microsoft report as of June 30, 2023?
71
+ - How are the details about the company's legal proceedings provided in the report?
72
+ model-index:
73
+ - name: BGE base Financial Matryoshka
74
+ results:
75
+ - task:
76
+ type: information-retrieval
77
+ name: Information Retrieval
78
+ dataset:
79
+ name: dim 768
80
+ type: dim_768
81
+ metrics:
82
+ - type: cosine_accuracy@1
83
+ value: 0.71
84
+ name: Cosine Accuracy@1
85
+ - type: cosine_accuracy@3
86
+ value: 0.84
87
+ name: Cosine Accuracy@3
88
+ - type: cosine_accuracy@5
89
+ value: 0.8685714285714285
90
+ name: Cosine Accuracy@5
91
+ - type: cosine_accuracy@10
92
+ value: 0.9142857142857143
93
+ name: Cosine Accuracy@10
94
+ - type: cosine_precision@1
95
+ value: 0.71
96
+ name: Cosine Precision@1
97
+ - type: cosine_precision@3
98
+ value: 0.28
99
+ name: Cosine Precision@3
100
+ - type: cosine_precision@5
101
+ value: 0.1737142857142857
102
+ name: Cosine Precision@5
103
+ - type: cosine_precision@10
104
+ value: 0.09142857142857143
105
+ name: Cosine Precision@10
106
+ - type: cosine_recall@1
107
+ value: 0.71
108
+ name: Cosine Recall@1
109
+ - type: cosine_recall@3
110
+ value: 0.84
111
+ name: Cosine Recall@3
112
+ - type: cosine_recall@5
113
+ value: 0.8685714285714285
114
+ name: Cosine Recall@5
115
+ - type: cosine_recall@10
116
+ value: 0.9142857142857143
117
+ name: Cosine Recall@10
118
+ - type: cosine_ndcg@10
119
+ value: 0.8124537511621754
120
+ name: Cosine Ndcg@10
121
+ - type: cosine_mrr@10
122
+ value: 0.7797726757369615
123
+ name: Cosine Mrr@10
124
+ - type: cosine_map@100
125
+ value: 0.7826418437079763
126
+ name: Cosine Map@100
127
+ - task:
128
+ type: information-retrieval
129
+ name: Information Retrieval
130
+ dataset:
131
+ name: dim 512
132
+ type: dim_512
133
+ metrics:
134
+ - type: cosine_accuracy@1
135
+ value: 0.7042857142857143
136
+ name: Cosine Accuracy@1
137
+ - type: cosine_accuracy@3
138
+ value: 0.8357142857142857
139
+ name: Cosine Accuracy@3
140
+ - type: cosine_accuracy@5
141
+ value: 0.8657142857142858
142
+ name: Cosine Accuracy@5
143
+ - type: cosine_accuracy@10
144
+ value: 0.9114285714285715
145
+ name: Cosine Accuracy@10
146
+ - type: cosine_precision@1
147
+ value: 0.7042857142857143
148
+ name: Cosine Precision@1
149
+ - type: cosine_precision@3
150
+ value: 0.2785714285714286
151
+ name: Cosine Precision@3
152
+ - type: cosine_precision@5
153
+ value: 0.17314285714285713
154
+ name: Cosine Precision@5
155
+ - type: cosine_precision@10
156
+ value: 0.09114285714285714
157
+ name: Cosine Precision@10
158
+ - type: cosine_recall@1
159
+ value: 0.7042857142857143
160
+ name: Cosine Recall@1
161
+ - type: cosine_recall@3
162
+ value: 0.8357142857142857
163
+ name: Cosine Recall@3
164
+ - type: cosine_recall@5
165
+ value: 0.8657142857142858
166
+ name: Cosine Recall@5
167
+ - type: cosine_recall@10
168
+ value: 0.9114285714285715
169
+ name: Cosine Recall@10
170
+ - type: cosine_ndcg@10
171
+ value: 0.8077533543226267
172
+ name: Cosine Ndcg@10
173
+ - type: cosine_mrr@10
174
+ value: 0.77450283446712
175
+ name: Cosine Mrr@10
176
+ - type: cosine_map@100
177
+ value: 0.7775892822045911
178
+ name: Cosine Map@100
179
+ - task:
180
+ type: information-retrieval
181
+ name: Information Retrieval
182
+ dataset:
183
+ name: dim 256
184
+ type: dim_256
185
+ metrics:
186
+ - type: cosine_accuracy@1
187
+ value: 0.7028571428571428
188
+ name: Cosine Accuracy@1
189
+ - type: cosine_accuracy@3
190
+ value: 0.8228571428571428
191
+ name: Cosine Accuracy@3
192
+ - type: cosine_accuracy@5
193
+ value: 0.8585714285714285
194
+ name: Cosine Accuracy@5
195
+ - type: cosine_accuracy@10
196
+ value: 0.8971428571428571
197
+ name: Cosine Accuracy@10
198
+ - type: cosine_precision@1
199
+ value: 0.7028571428571428
200
+ name: Cosine Precision@1
201
+ - type: cosine_precision@3
202
+ value: 0.2742857142857143
203
+ name: Cosine Precision@3
204
+ - type: cosine_precision@5
205
+ value: 0.1717142857142857
206
+ name: Cosine Precision@5
207
+ - type: cosine_precision@10
208
+ value: 0.0897142857142857
209
+ name: Cosine Precision@10
210
+ - type: cosine_recall@1
211
+ value: 0.7028571428571428
212
+ name: Cosine Recall@1
213
+ - type: cosine_recall@3
214
+ value: 0.8228571428571428
215
+ name: Cosine Recall@3
216
+ - type: cosine_recall@5
217
+ value: 0.8585714285714285
218
+ name: Cosine Recall@5
219
+ - type: cosine_recall@10
220
+ value: 0.8971428571428571
221
+ name: Cosine Recall@10
222
+ - type: cosine_ndcg@10
223
+ value: 0.8004396670945336
224
+ name: Cosine Ndcg@10
225
+ - type: cosine_mrr@10
226
+ value: 0.7693480725623582
227
+ name: Cosine Mrr@10
228
+ - type: cosine_map@100
229
+ value: 0.7733203320348766
230
+ name: Cosine Map@100
231
+ - task:
232
+ type: information-retrieval
233
+ name: Information Retrieval
234
+ dataset:
235
+ name: dim 128
236
+ type: dim_128
237
+ metrics:
238
+ - type: cosine_accuracy@1
239
+ value: 0.6771428571428572
240
+ name: Cosine Accuracy@1
241
+ - type: cosine_accuracy@3
242
+ value: 0.8142857142857143
243
+ name: Cosine Accuracy@3
244
+ - type: cosine_accuracy@5
245
+ value: 0.8542857142857143
246
+ name: Cosine Accuracy@5
247
+ - type: cosine_accuracy@10
248
+ value: 0.8971428571428571
249
+ name: Cosine Accuracy@10
250
+ - type: cosine_precision@1
251
+ value: 0.6771428571428572
252
+ name: Cosine Precision@1
253
+ - type: cosine_precision@3
254
+ value: 0.2714285714285714
255
+ name: Cosine Precision@3
256
+ - type: cosine_precision@5
257
+ value: 0.17085714285714285
258
+ name: Cosine Precision@5
259
+ - type: cosine_precision@10
260
+ value: 0.0897142857142857
261
+ name: Cosine Precision@10
262
+ - type: cosine_recall@1
263
+ value: 0.6771428571428572
264
+ name: Cosine Recall@1
265
+ - type: cosine_recall@3
266
+ value: 0.8142857142857143
267
+ name: Cosine Recall@3
268
+ - type: cosine_recall@5
269
+ value: 0.8542857142857143
270
+ name: Cosine Recall@5
271
+ - type: cosine_recall@10
272
+ value: 0.8971428571428571
273
+ name: Cosine Recall@10
274
+ - type: cosine_ndcg@10
275
+ value: 0.788715031897326
276
+ name: Cosine Ndcg@10
277
+ - type: cosine_mrr@10
278
+ value: 0.7538418367346936
279
+ name: Cosine Mrr@10
280
+ - type: cosine_map@100
281
+ value: 0.7573369186799356
282
+ name: Cosine Map@100
283
+ - task:
284
+ type: information-retrieval
285
+ name: Information Retrieval
286
+ dataset:
287
+ name: dim 64
288
+ type: dim_64
289
+ metrics:
290
+ - type: cosine_accuracy@1
291
+ value: 0.6642857142857143
292
+ name: Cosine Accuracy@1
293
+ - type: cosine_accuracy@3
294
+ value: 0.7814285714285715
295
+ name: Cosine Accuracy@3
296
+ - type: cosine_accuracy@5
297
+ value: 0.8128571428571428
298
+ name: Cosine Accuracy@5
299
+ - type: cosine_accuracy@10
300
+ value: 0.86
301
+ name: Cosine Accuracy@10
302
+ - type: cosine_precision@1
303
+ value: 0.6642857142857143
304
+ name: Cosine Precision@1
305
+ - type: cosine_precision@3
306
+ value: 0.2604761904761905
307
+ name: Cosine Precision@3
308
+ - type: cosine_precision@5
309
+ value: 0.16257142857142853
310
+ name: Cosine Precision@5
311
+ - type: cosine_precision@10
312
+ value: 0.086
313
+ name: Cosine Precision@10
314
+ - type: cosine_recall@1
315
+ value: 0.6642857142857143
316
+ name: Cosine Recall@1
317
+ - type: cosine_recall@3
318
+ value: 0.7814285714285715
319
+ name: Cosine Recall@3
320
+ - type: cosine_recall@5
321
+ value: 0.8128571428571428
322
+ name: Cosine Recall@5
323
+ - type: cosine_recall@10
324
+ value: 0.86
325
+ name: Cosine Recall@10
326
+ - type: cosine_ndcg@10
327
+ value: 0.7600084252085629
328
+ name: Cosine Ndcg@10
329
+ - type: cosine_mrr@10
330
+ value: 0.7282585034013601
331
+ name: Cosine Mrr@10
332
+ - type: cosine_map@100
333
+ value: 0.733116708012112
334
+ name: Cosine Map@100
335
+ ---
336
+
337
+ # BGE base Financial Matryoshka
338
+
339
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
340
+
341
+ ## Model Details
342
+
343
+ ### Model Description
344
+ - **Model Type:** Sentence Transformer
345
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
346
+ - **Maximum Sequence Length:** 512 tokens
347
+ - **Output Dimensionality:** 768 tokens
348
+ - **Similarity Function:** Cosine Similarity
349
+ <!-- - **Training Dataset:** Unknown -->
350
+ - **Language:** en
351
+ - **License:** apache-2.0
352
+
353
+ ### Model Sources
354
+
355
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
356
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
357
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
358
+
359
+ ### Full Model Architecture
360
+
361
+ ```
362
+ SentenceTransformer(
363
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
364
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
365
+ (2): Normalize()
366
+ )
367
+ ```
368
+
369
+ ## Usage
370
+
371
+ ### Direct Usage (Sentence Transformers)
372
+
373
+ First install the Sentence Transformers library:
374
+
375
+ ```bash
376
+ pip install -U sentence-transformers
377
+ ```
378
+
379
+ Then you can load this model and run inference.
380
+ ```python
381
+ from sentence_transformers import SentenceTransformer
382
+
383
+ # Download from the 🤗 Hub
384
+ model = SentenceTransformer("Naruke/bge-base-financial-matryoshka")
385
+ # Run inference
386
+ sentences = [
387
+ 'As part of our solar energy system and energy storage contracts, we may provide the customer with performance guarantees that commit that the underlying system will meet or exceed the minimum energy generation or performance requirements specified in the contract.',
388
+ 'What types of guarantees does Tesla provide to its solar and energy storage customers?',
389
+ 'How many full-time employees did Microsoft report as of June 30, 2023?',
390
+ ]
391
+ embeddings = model.encode(sentences)
392
+ print(embeddings.shape)
393
+ # [3, 768]
394
+
395
+ # Get the similarity scores for the embeddings
396
+ similarities = model.similarity(embeddings, embeddings)
397
+ print(similarities.shape)
398
+ # [3, 3]
399
+ ```
400
+
401
+ <!--
402
+ ### Direct Usage (Transformers)
403
+
404
+ <details><summary>Click to see the direct usage in Transformers</summary>
405
+
406
+ </details>
407
+ -->
408
+
409
+ <!--
410
+ ### Downstream Usage (Sentence Transformers)
411
+
412
+ You can finetune this model on your own dataset.
413
+
414
+ <details><summary>Click to expand</summary>
415
+
416
+ </details>
417
+ -->
418
+
419
+ <!--
420
+ ### Out-of-Scope Use
421
+
422
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
423
+ -->
424
+
425
+ ## Evaluation
426
+
427
+ ### Metrics
428
+
429
+ #### Information Retrieval
430
+ * Dataset: `dim_768`
431
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
432
+
433
+ | Metric | Value |
434
+ |:--------------------|:-----------|
435
+ | cosine_accuracy@1 | 0.71 |
436
+ | cosine_accuracy@3 | 0.84 |
437
+ | cosine_accuracy@5 | 0.8686 |
438
+ | cosine_accuracy@10 | 0.9143 |
439
+ | cosine_precision@1 | 0.71 |
440
+ | cosine_precision@3 | 0.28 |
441
+ | cosine_precision@5 | 0.1737 |
442
+ | cosine_precision@10 | 0.0914 |
443
+ | cosine_recall@1 | 0.71 |
444
+ | cosine_recall@3 | 0.84 |
445
+ | cosine_recall@5 | 0.8686 |
446
+ | cosine_recall@10 | 0.9143 |
447
+ | cosine_ndcg@10 | 0.8125 |
448
+ | cosine_mrr@10 | 0.7798 |
449
+ | **cosine_map@100** | **0.7826** |
450
+
451
+ #### Information Retrieval
452
+ * Dataset: `dim_512`
453
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
454
+
455
+ | Metric | Value |
456
+ |:--------------------|:-----------|
457
+ | cosine_accuracy@1 | 0.7043 |
458
+ | cosine_accuracy@3 | 0.8357 |
459
+ | cosine_accuracy@5 | 0.8657 |
460
+ | cosine_accuracy@10 | 0.9114 |
461
+ | cosine_precision@1 | 0.7043 |
462
+ | cosine_precision@3 | 0.2786 |
463
+ | cosine_precision@5 | 0.1731 |
464
+ | cosine_precision@10 | 0.0911 |
465
+ | cosine_recall@1 | 0.7043 |
466
+ | cosine_recall@3 | 0.8357 |
467
+ | cosine_recall@5 | 0.8657 |
468
+ | cosine_recall@10 | 0.9114 |
469
+ | cosine_ndcg@10 | 0.8078 |
470
+ | cosine_mrr@10 | 0.7745 |
471
+ | **cosine_map@100** | **0.7776** |
472
+
473
+ #### Information Retrieval
474
+ * Dataset: `dim_256`
475
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
476
+
477
+ | Metric | Value |
478
+ |:--------------------|:-----------|
479
+ | cosine_accuracy@1 | 0.7029 |
480
+ | cosine_accuracy@3 | 0.8229 |
481
+ | cosine_accuracy@5 | 0.8586 |
482
+ | cosine_accuracy@10 | 0.8971 |
483
+ | cosine_precision@1 | 0.7029 |
484
+ | cosine_precision@3 | 0.2743 |
485
+ | cosine_precision@5 | 0.1717 |
486
+ | cosine_precision@10 | 0.0897 |
487
+ | cosine_recall@1 | 0.7029 |
488
+ | cosine_recall@3 | 0.8229 |
489
+ | cosine_recall@5 | 0.8586 |
490
+ | cosine_recall@10 | 0.8971 |
491
+ | cosine_ndcg@10 | 0.8004 |
492
+ | cosine_mrr@10 | 0.7693 |
493
+ | **cosine_map@100** | **0.7733** |
494
+
495
+ #### Information Retrieval
496
+ * Dataset: `dim_128`
497
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
498
+
499
+ | Metric | Value |
500
+ |:--------------------|:-----------|
501
+ | cosine_accuracy@1 | 0.6771 |
502
+ | cosine_accuracy@3 | 0.8143 |
503
+ | cosine_accuracy@5 | 0.8543 |
504
+ | cosine_accuracy@10 | 0.8971 |
505
+ | cosine_precision@1 | 0.6771 |
506
+ | cosine_precision@3 | 0.2714 |
507
+ | cosine_precision@5 | 0.1709 |
508
+ | cosine_precision@10 | 0.0897 |
509
+ | cosine_recall@1 | 0.6771 |
510
+ | cosine_recall@3 | 0.8143 |
511
+ | cosine_recall@5 | 0.8543 |
512
+ | cosine_recall@10 | 0.8971 |
513
+ | cosine_ndcg@10 | 0.7887 |
514
+ | cosine_mrr@10 | 0.7538 |
515
+ | **cosine_map@100** | **0.7573** |
516
+
517
+ #### Information Retrieval
518
+ * Dataset: `dim_64`
519
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
520
+
521
+ | Metric | Value |
522
+ |:--------------------|:-----------|
523
+ | cosine_accuracy@1 | 0.6643 |
524
+ | cosine_accuracy@3 | 0.7814 |
525
+ | cosine_accuracy@5 | 0.8129 |
526
+ | cosine_accuracy@10 | 0.86 |
527
+ | cosine_precision@1 | 0.6643 |
528
+ | cosine_precision@3 | 0.2605 |
529
+ | cosine_precision@5 | 0.1626 |
530
+ | cosine_precision@10 | 0.086 |
531
+ | cosine_recall@1 | 0.6643 |
532
+ | cosine_recall@3 | 0.7814 |
533
+ | cosine_recall@5 | 0.8129 |
534
+ | cosine_recall@10 | 0.86 |
535
+ | cosine_ndcg@10 | 0.76 |
536
+ | cosine_mrr@10 | 0.7283 |
537
+ | **cosine_map@100** | **0.7331** |
538
+
539
+ <!--
540
+ ## Bias, Risks and Limitations
541
+
542
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
543
+ -->
544
+
545
+ <!--
546
+ ### Recommendations
547
+
548
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
549
+ -->
550
+
551
+ ## Training Details
552
+
553
+ ### Training Dataset
554
+
555
+ #### Unnamed Dataset
556
+
557
+
558
+ * Size: 6,300 training samples
559
+ * Columns: <code>positive</code> and <code>anchor</code>
560
+ * Approximate statistics based on the first 1000 samples:
561
+ | | positive | anchor |
562
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
563
+ | type | string | string |
564
+ | details | <ul><li>min: 9 tokens</li><li>mean: 45.57 tokens</li><li>max: 289 tokens</li></ul> | <ul><li>min: 9 tokens</li><li>mean: 20.32 tokens</li><li>max: 51 tokens</li></ul> |
565
+ * Samples:
566
+ | positive | anchor |
567
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------|
568
+ | <code>The detailed information about commitments and contingencies related to legal proceedings is included under Note 13 in Part II, Item 8 of the Annual Report.</code> | <code>Where can detailed information about the commitments and contingencies related to legal proceedings be found in the Annual Report on Form 10-K?</code> |
569
+ | <code>American Express's decision to reinvest gains into its business will depend on regulatory and other approvals, consultation requirements, the execution of ancillary agreements, the cost and availability of financing for the purchaser to fund the transaction and the potential loss of key customers, vendors and other business partners and management’s decisions regarding future operations, strategies and business initiatives.</code> | <code>What factors influence American Express's decision to reinvest gains into its business?</code> |
570
+ | <code>Lease obligations as of June 30, 2023, related to office space and various facilities totaled $883.1 million, with lease terms ranging from one to 21 years and are mostly renewable.</code> | <code>How much were lease obligations related to office space and other facilities as of June 30, 2023, and what were the terms?</code> |
571
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
572
+ ```json
573
+ {
574
+ "loss": "MultipleNegativesRankingLoss",
575
+ "matryoshka_dims": [
576
+ 768,
577
+ 512,
578
+ 256,
579
+ 128,
580
+ 64
581
+ ],
582
+ "matryoshka_weights": [
583
+ 1,
584
+ 1,
585
+ 1,
586
+ 1,
587
+ 1
588
+ ],
589
+ "n_dims_per_step": -1
590
+ }
591
+ ```
592
+
593
+ ### Training Hyperparameters
594
+ #### Non-Default Hyperparameters
595
+
596
+ - `eval_strategy`: epoch
597
+ - `per_device_train_batch_size`: 16
598
+ - `per_device_eval_batch_size`: 16
599
+ - `gradient_accumulation_steps`: 16
600
+ - `learning_rate`: 2e-05
601
+ - `num_train_epochs`: 2
602
+ - `lr_scheduler_type`: cosine
603
+ - `warmup_ratio`: 0.1
604
+ - `bf16`: True
605
+ - `load_best_model_at_end`: True
606
+ - `optim`: adamw_torch_fused
607
+ - `batch_sampler`: no_duplicates
608
+
609
+ #### All Hyperparameters
610
+ <details><summary>Click to expand</summary>
611
+
612
+ - `overwrite_output_dir`: False
613
+ - `do_predict`: False
614
+ - `eval_strategy`: epoch
615
+ - `prediction_loss_only`: True
616
+ - `per_device_train_batch_size`: 16
617
+ - `per_device_eval_batch_size`: 16
618
+ - `per_gpu_train_batch_size`: None
619
+ - `per_gpu_eval_batch_size`: None
620
+ - `gradient_accumulation_steps`: 16
621
+ - `eval_accumulation_steps`: None
622
+ - `learning_rate`: 2e-05
623
+ - `weight_decay`: 0.0
624
+ - `adam_beta1`: 0.9
625
+ - `adam_beta2`: 0.999
626
+ - `adam_epsilon`: 1e-08
627
+ - `max_grad_norm`: 1.0
628
+ - `num_train_epochs`: 2
629
+ - `max_steps`: -1
630
+ - `lr_scheduler_type`: cosine
631
+ - `lr_scheduler_kwargs`: {}
632
+ - `warmup_ratio`: 0.1
633
+ - `warmup_steps`: 0
634
+ - `log_level`: passive
635
+ - `log_level_replica`: warning
636
+ - `log_on_each_node`: True
637
+ - `logging_nan_inf_filter`: True
638
+ - `save_safetensors`: True
639
+ - `save_on_each_node`: False
640
+ - `save_only_model`: False
641
+ - `restore_callback_states_from_checkpoint`: False
642
+ - `no_cuda`: False
643
+ - `use_cpu`: False
644
+ - `use_mps_device`: False
645
+ - `seed`: 42
646
+ - `data_seed`: None
647
+ - `jit_mode_eval`: False
648
+ - `use_ipex`: False
649
+ - `bf16`: True
650
+ - `fp16`: False
651
+ - `fp16_opt_level`: O1
652
+ - `half_precision_backend`: auto
653
+ - `bf16_full_eval`: False
654
+ - `fp16_full_eval`: False
655
+ - `tf32`: None
656
+ - `local_rank`: 0
657
+ - `ddp_backend`: None
658
+ - `tpu_num_cores`: None
659
+ - `tpu_metrics_debug`: False
660
+ - `debug`: []
661
+ - `dataloader_drop_last`: False
662
+ - `dataloader_num_workers`: 0
663
+ - `dataloader_prefetch_factor`: None
664
+ - `past_index`: -1
665
+ - `disable_tqdm`: False
666
+ - `remove_unused_columns`: True
667
+ - `label_names`: None
668
+ - `load_best_model_at_end`: True
669
+ - `ignore_data_skip`: False
670
+ - `fsdp`: []
671
+ - `fsdp_min_num_params`: 0
672
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
673
+ - `fsdp_transformer_layer_cls_to_wrap`: None
674
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
675
+ - `deepspeed`: None
676
+ - `label_smoothing_factor`: 0.0
677
+ - `optim`: adamw_torch_fused
678
+ - `optim_args`: None
679
+ - `adafactor`: False
680
+ - `group_by_length`: False
681
+ - `length_column_name`: length
682
+ - `ddp_find_unused_parameters`: None
683
+ - `ddp_bucket_cap_mb`: None
684
+ - `ddp_broadcast_buffers`: False
685
+ - `dataloader_pin_memory`: True
686
+ - `dataloader_persistent_workers`: False
687
+ - `skip_memory_metrics`: True
688
+ - `use_legacy_prediction_loop`: False
689
+ - `push_to_hub`: False
690
+ - `resume_from_checkpoint`: None
691
+ - `hub_model_id`: None
692
+ - `hub_strategy`: every_save
693
+ - `hub_private_repo`: False
694
+ - `hub_always_push`: False
695
+ - `gradient_checkpointing`: False
696
+ - `gradient_checkpointing_kwargs`: None
697
+ - `include_inputs_for_metrics`: False
698
+ - `eval_do_concat_batches`: True
699
+ - `fp16_backend`: auto
700
+ - `push_to_hub_model_id`: None
701
+ - `push_to_hub_organization`: None
702
+ - `mp_parameters`:
703
+ - `auto_find_batch_size`: False
704
+ - `full_determinism`: False
705
+ - `torchdynamo`: None
706
+ - `ray_scope`: last
707
+ - `ddp_timeout`: 1800
708
+ - `torch_compile`: False
709
+ - `torch_compile_backend`: None
710
+ - `torch_compile_mode`: None
711
+ - `dispatch_batches`: None
712
+ - `split_batches`: None
713
+ - `include_tokens_per_second`: False
714
+ - `include_num_input_tokens_seen`: False
715
+ - `neftune_noise_alpha`: None
716
+ - `optim_target_modules`: None
717
+ - `batch_eval_metrics`: False
718
+ - `batch_sampler`: no_duplicates
719
+ - `multi_dataset_batch_sampler`: proportional
720
+
721
+ </details>
722
+
723
+ ### Training Logs
724
+ | Epoch | Step | Training Loss | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_512_cosine_map@100 | dim_64_cosine_map@100 | dim_768_cosine_map@100 |
725
+ |:----------:|:------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|:----------------------:|
726
+ | 0.4061 | 10 | 0.9835 | - | - | - | - | - |
727
+ | 0.8122 | 20 | 0.4319 | - | - | - | - | - |
728
+ | 0.9746 | 24 | - | 0.7541 | 0.7729 | 0.7738 | 0.7242 | 0.7786 |
729
+ | 1.2183 | 30 | 0.3599 | - | - | - | - | - |
730
+ | 1.6244 | 40 | 0.2596 | - | - | - | - | - |
731
+ | **1.9492** | **48** | **-** | **0.7573** | **0.7733** | **0.7776** | **0.7331** | **0.7826** |
732
+
733
+ * The bold row denotes the saved checkpoint.
734
+
735
+ ### Framework Versions
736
+ - Python: 3.10.12
737
+ - Sentence Transformers: 3.0.1
738
+ - Transformers: 4.41.2
739
+ - PyTorch: 2.3.0+cu121
740
+ - Accelerate: 0.32.1
741
+ - Datasets: 2.20.0
742
+ - Tokenizers: 0.19.1
743
+
744
+ ## Citation
745
+
746
+ ### BibTeX
747
+
748
+ #### Sentence Transformers
749
+ ```bibtex
750
+ @inproceedings{reimers-2019-sentence-bert,
751
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
752
+ author = "Reimers, Nils and Gurevych, Iryna",
753
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
754
+ month = "11",
755
+ year = "2019",
756
+ publisher = "Association for Computational Linguistics",
757
+ url = "https://arxiv.org/abs/1908.10084",
758
+ }
759
+ ```
760
+
761
+ #### MatryoshkaLoss
762
+ ```bibtex
763
+ @misc{kusupati2024matryoshka,
764
+ title={Matryoshka Representation Learning},
765
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
766
+ year={2024},
767
+ eprint={2205.13147},
768
+ archivePrefix={arXiv},
769
+ primaryClass={cs.LG}
770
+ }
771
+ ```
772
+
773
+ #### MultipleNegativesRankingLoss
774
+ ```bibtex
775
+ @misc{henderson2017efficient,
776
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
777
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
778
+ year={2017},
779
+ eprint={1705.00652},
780
+ archivePrefix={arXiv},
781
+ primaryClass={cs.CL}
782
+ }
783
+ ```
784
+
785
+ <!--
786
+ ## Glossary
787
+
788
+ *Clearly define terms in order to be accessible across audiences.*
789
+ -->
790
+
791
+ <!--
792
+ ## Model Card Authors
793
+
794
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
795
+ -->
796
+
797
+ <!--
798
+ ## Model Card Contact
799
+
800
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
801
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.41.2",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.41.2",
5
+ "pytorch": "2.3.0+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6db0fe99c8d6f5aced3b2bd1fb7f5fe5ad0447ec1a7e0383c102ac880c88d81b
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff