YxBxRyXJx commited on
Commit
91b19b4
·
verified ·
1 Parent(s): 3ce4388

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,723 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - generated_from_trainer
10
+ - dataset_size:5600
11
+ - loss:MatryoshkaLoss
12
+ - loss:MultipleNegativesRankingLoss
13
+ base_model: BAAI/bge-base-en-v1.5
14
+ widget:
15
+ - source_sentence: The Federal Energy Regulatory Commission (“FERC”) has also taken
16
+ steps to enable the participation of energy storage in wholesale energy markets.
17
+ sentences:
18
+ - What segment-specific regulations apply to CVS Health Corporation's Pharmacy &
19
+ Consumer Wellness segment?
20
+ - What types of contracts does the company have for its health insurance plans,
21
+ and how does premium revenue recognition function under these contracts?
22
+ - What federal agency has taken steps to facilitate energy storage participation
23
+ in wholesale energy markets?
24
+ - source_sentence: Investments in subsidiaries and partnerships which we do not control
25
+ but have significant influence are accounted for under the equity method.
26
+ sentences:
27
+ - How does the company aim to protect the health and well-being of the communities
28
+ it operates in?
29
+ - What are the key factors affecting the evaluation of the Economic Value of Equity
30
+ (EVE) at the Charles Schwab Corporation?
31
+ - What accounting method does the company use to account for investments in subsidiaries
32
+ and partnerships where it does not control but has significant influence?
33
+ - source_sentence: Item 8 of IBM's 2023 Annual Report includes financial statements
34
+ and supplementary data spanning pages 44 through 121.
35
+ sentences:
36
+ - What entities are included among the Guarantors that guarantee each other’s debt
37
+ securities as described in Comcast’s 2023 Annual Report?
38
+ - What uncertainties exist regarding projections of future cash needs and cash flows?
39
+ - How many pages in IBM's 2023 Annual Report to Stockholders are dedicated to financial
40
+ statements and supplementary data?
41
+ - source_sentence: 'Our compensation philosophy creates the framework for our rewards
42
+ strategy, which focuses on five key elements: pay-for-performance, external market-based
43
+ research, internal equity, fiscal responsibility, and legal compliance.'
44
+ sentences:
45
+ - What financial instruments does the company invest in that are sensitive to interest
46
+ rates?
47
+ - What elements are included in the company's compensation programs?
48
+ - What is the expected maximum potential loss from hurricane events for Chubb as
49
+ of the end of 2023?
50
+ - source_sentence: Outside of the U.S., many countries have established vehicle safety
51
+ standards and regulations and are likely to adopt additional, more stringent requirements
52
+ in the future.
53
+ sentences:
54
+ - What percentage of the company's sales categories in fiscal 2023 were failure
55
+ and maintenance related?
56
+ - What competitive factors influence Chubb International's international operations?
57
+ - What changes are occurring with vehicle safety regulations outside of the U.S.?
58
+ pipeline_tag: sentence-similarity
59
+ library_name: sentence-transformers
60
+ metrics:
61
+ - cosine_accuracy@1
62
+ - cosine_accuracy@3
63
+ - cosine_accuracy@5
64
+ - cosine_accuracy@10
65
+ - cosine_precision@1
66
+ - cosine_precision@3
67
+ - cosine_precision@5
68
+ - cosine_precision@10
69
+ - cosine_recall@1
70
+ - cosine_recall@3
71
+ - cosine_recall@5
72
+ - cosine_recall@10
73
+ - cosine_ndcg@10
74
+ - cosine_mrr@10
75
+ - cosine_map@100
76
+ model-index:
77
+ - name: BGE base Financial Matryoshka
78
+ results:
79
+ - task:
80
+ type: information-retrieval
81
+ name: Information Retrieval
82
+ dataset:
83
+ name: dim 768
84
+ type: dim_768
85
+ metrics:
86
+ - type: cosine_accuracy@1
87
+ value: 0.6885714285714286
88
+ name: Cosine Accuracy@1
89
+ - type: cosine_accuracy@3
90
+ value: 0.8278571428571428
91
+ name: Cosine Accuracy@3
92
+ - type: cosine_accuracy@5
93
+ value: 0.8728571428571429
94
+ name: Cosine Accuracy@5
95
+ - type: cosine_accuracy@10
96
+ value: 0.9164285714285715
97
+ name: Cosine Accuracy@10
98
+ - type: cosine_precision@1
99
+ value: 0.6885714285714286
100
+ name: Cosine Precision@1
101
+ - type: cosine_precision@3
102
+ value: 0.275952380952381
103
+ name: Cosine Precision@3
104
+ - type: cosine_precision@5
105
+ value: 0.17457142857142854
106
+ name: Cosine Precision@5
107
+ - type: cosine_precision@10
108
+ value: 0.09164285714285714
109
+ name: Cosine Precision@10
110
+ - type: cosine_recall@1
111
+ value: 0.6885714285714286
112
+ name: Cosine Recall@1
113
+ - type: cosine_recall@3
114
+ value: 0.8278571428571428
115
+ name: Cosine Recall@3
116
+ - type: cosine_recall@5
117
+ value: 0.8728571428571429
118
+ name: Cosine Recall@5
119
+ - type: cosine_recall@10
120
+ value: 0.9164285714285715
121
+ name: Cosine Recall@10
122
+ - type: cosine_ndcg@10
123
+ value: 0.8042449175537354
124
+ name: Cosine Ndcg@10
125
+ - type: cosine_mrr@10
126
+ value: 0.768181405895692
127
+ name: Cosine Mrr@10
128
+ - type: cosine_map@100
129
+ value: 0.7712863400405022
130
+ name: Cosine Map@100
131
+ - task:
132
+ type: information-retrieval
133
+ name: Information Retrieval
134
+ dataset:
135
+ name: dim 512
136
+ type: dim_512
137
+ metrics:
138
+ - type: cosine_accuracy@1
139
+ value: 0.6864285714285714
140
+ name: Cosine Accuracy@1
141
+ - type: cosine_accuracy@3
142
+ value: 0.8292857142857143
143
+ name: Cosine Accuracy@3
144
+ - type: cosine_accuracy@5
145
+ value: 0.8728571428571429
146
+ name: Cosine Accuracy@5
147
+ - type: cosine_accuracy@10
148
+ value: 0.9135714285714286
149
+ name: Cosine Accuracy@10
150
+ - type: cosine_precision@1
151
+ value: 0.6864285714285714
152
+ name: Cosine Precision@1
153
+ - type: cosine_precision@3
154
+ value: 0.2764285714285714
155
+ name: Cosine Precision@3
156
+ - type: cosine_precision@5
157
+ value: 0.17457142857142854
158
+ name: Cosine Precision@5
159
+ - type: cosine_precision@10
160
+ value: 0.09135714285714285
161
+ name: Cosine Precision@10
162
+ - type: cosine_recall@1
163
+ value: 0.6864285714285714
164
+ name: Cosine Recall@1
165
+ - type: cosine_recall@3
166
+ value: 0.8292857142857143
167
+ name: Cosine Recall@3
168
+ - type: cosine_recall@5
169
+ value: 0.8728571428571429
170
+ name: Cosine Recall@5
171
+ - type: cosine_recall@10
172
+ value: 0.9135714285714286
173
+ name: Cosine Recall@10
174
+ - type: cosine_ndcg@10
175
+ value: 0.8024352620004916
176
+ name: Cosine Ndcg@10
177
+ - type: cosine_mrr@10
178
+ value: 0.7665753968253971
179
+ name: Cosine Mrr@10
180
+ - type: cosine_map@100
181
+ value: 0.7697268174707245
182
+ name: Cosine Map@100
183
+ - task:
184
+ type: information-retrieval
185
+ name: Information Retrieval
186
+ dataset:
187
+ name: dim 256
188
+ type: dim_256
189
+ metrics:
190
+ - type: cosine_accuracy@1
191
+ value: 0.68
192
+ name: Cosine Accuracy@1
193
+ - type: cosine_accuracy@3
194
+ value: 0.825
195
+ name: Cosine Accuracy@3
196
+ - type: cosine_accuracy@5
197
+ value: 0.8635714285714285
198
+ name: Cosine Accuracy@5
199
+ - type: cosine_accuracy@10
200
+ value: 0.9042857142857142
201
+ name: Cosine Accuracy@10
202
+ - type: cosine_precision@1
203
+ value: 0.68
204
+ name: Cosine Precision@1
205
+ - type: cosine_precision@3
206
+ value: 0.275
207
+ name: Cosine Precision@3
208
+ - type: cosine_precision@5
209
+ value: 0.1727142857142857
210
+ name: Cosine Precision@5
211
+ - type: cosine_precision@10
212
+ value: 0.09042857142857141
213
+ name: Cosine Precision@10
214
+ - type: cosine_recall@1
215
+ value: 0.68
216
+ name: Cosine Recall@1
217
+ - type: cosine_recall@3
218
+ value: 0.825
219
+ name: Cosine Recall@3
220
+ - type: cosine_recall@5
221
+ value: 0.8635714285714285
222
+ name: Cosine Recall@5
223
+ - type: cosine_recall@10
224
+ value: 0.9042857142857142
225
+ name: Cosine Recall@10
226
+ - type: cosine_ndcg@10
227
+ value: 0.7955058944909328
228
+ name: Cosine Ndcg@10
229
+ - type: cosine_mrr@10
230
+ value: 0.7603066893424041
231
+ name: Cosine Mrr@10
232
+ - type: cosine_map@100
233
+ value: 0.7637281364444245
234
+ name: Cosine Map@100
235
+ - task:
236
+ type: information-retrieval
237
+ name: Information Retrieval
238
+ dataset:
239
+ name: dim 128
240
+ type: dim_128
241
+ metrics:
242
+ - type: cosine_accuracy@1
243
+ value: 0.6621428571428571
244
+ name: Cosine Accuracy@1
245
+ - type: cosine_accuracy@3
246
+ value: 0.7964285714285714
247
+ name: Cosine Accuracy@3
248
+ - type: cosine_accuracy@5
249
+ value: 0.8457142857142858
250
+ name: Cosine Accuracy@5
251
+ - type: cosine_accuracy@10
252
+ value: 0.8907142857142857
253
+ name: Cosine Accuracy@10
254
+ - type: cosine_precision@1
255
+ value: 0.6621428571428571
256
+ name: Cosine Precision@1
257
+ - type: cosine_precision@3
258
+ value: 0.2654761904761905
259
+ name: Cosine Precision@3
260
+ - type: cosine_precision@5
261
+ value: 0.16914285714285712
262
+ name: Cosine Precision@5
263
+ - type: cosine_precision@10
264
+ value: 0.08907142857142857
265
+ name: Cosine Precision@10
266
+ - type: cosine_recall@1
267
+ value: 0.6621428571428571
268
+ name: Cosine Recall@1
269
+ - type: cosine_recall@3
270
+ value: 0.7964285714285714
271
+ name: Cosine Recall@3
272
+ - type: cosine_recall@5
273
+ value: 0.8457142857142858
274
+ name: Cosine Recall@5
275
+ - type: cosine_recall@10
276
+ value: 0.8907142857142857
277
+ name: Cosine Recall@10
278
+ - type: cosine_ndcg@10
279
+ value: 0.7772894744328753
280
+ name: Cosine Ndcg@10
281
+ - type: cosine_mrr@10
282
+ value: 0.7408999433106581
283
+ name: Cosine Mrr@10
284
+ - type: cosine_map@100
285
+ value: 0.7449491476160666
286
+ name: Cosine Map@100
287
+ - task:
288
+ type: information-retrieval
289
+ name: Information Retrieval
290
+ dataset:
291
+ name: dim 64
292
+ type: dim_64
293
+ metrics:
294
+ - type: cosine_accuracy@1
295
+ value: 0.6285714285714286
296
+ name: Cosine Accuracy@1
297
+ - type: cosine_accuracy@3
298
+ value: 0.7635714285714286
299
+ name: Cosine Accuracy@3
300
+ - type: cosine_accuracy@5
301
+ value: 0.8057142857142857
302
+ name: Cosine Accuracy@5
303
+ - type: cosine_accuracy@10
304
+ value: 0.8642857142857143
305
+ name: Cosine Accuracy@10
306
+ - type: cosine_precision@1
307
+ value: 0.6285714285714286
308
+ name: Cosine Precision@1
309
+ - type: cosine_precision@3
310
+ value: 0.2545238095238095
311
+ name: Cosine Precision@3
312
+ - type: cosine_precision@5
313
+ value: 0.16114285714285712
314
+ name: Cosine Precision@5
315
+ - type: cosine_precision@10
316
+ value: 0.08642857142857142
317
+ name: Cosine Precision@10
318
+ - type: cosine_recall@1
319
+ value: 0.6285714285714286
320
+ name: Cosine Recall@1
321
+ - type: cosine_recall@3
322
+ value: 0.7635714285714286
323
+ name: Cosine Recall@3
324
+ - type: cosine_recall@5
325
+ value: 0.8057142857142857
326
+ name: Cosine Recall@5
327
+ - type: cosine_recall@10
328
+ value: 0.8642857142857143
329
+ name: Cosine Recall@10
330
+ - type: cosine_ndcg@10
331
+ value: 0.7447153698860624
332
+ name: Cosine Ndcg@10
333
+ - type: cosine_mrr@10
334
+ value: 0.7067037981859416
335
+ name: Cosine Mrr@10
336
+ - type: cosine_map@100
337
+ value: 0.7112341263725279
338
+ name: Cosine Map@100
339
+ ---
340
+
341
+ # BGE base Financial Matryoshka
342
+
343
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
344
+
345
+ ## Model Details
346
+
347
+ ### Model Description
348
+ - **Model Type:** Sentence Transformer
349
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
350
+ - **Maximum Sequence Length:** 512 tokens
351
+ - **Output Dimensionality:** 768 dimensions
352
+ - **Similarity Function:** Cosine Similarity
353
+ - **Training Dataset:**
354
+ - json
355
+ - **Language:** en
356
+ - **License:** apache-2.0
357
+
358
+ ### Model Sources
359
+
360
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
361
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
362
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
363
+
364
+ ### Full Model Architecture
365
+
366
+ ```
367
+ SentenceTransformer(
368
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
369
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
370
+ (2): Normalize()
371
+ )
372
+ ```
373
+
374
+ ## Usage
375
+
376
+ ### Direct Usage (Sentence Transformers)
377
+
378
+ First install the Sentence Transformers library:
379
+
380
+ ```bash
381
+ pip install -U sentence-transformers
382
+ ```
383
+
384
+ Then you can load this model and run inference.
385
+ ```python
386
+ from sentence_transformers import SentenceTransformer
387
+
388
+ # Download from the 🤗 Hub
389
+ model = SentenceTransformer("YxBxRyXJx/bge-base-financial-matryoshka")
390
+ # Run inference
391
+ sentences = [
392
+ 'Outside of the U.S., many countries have established vehicle safety standards and regulations and are likely to adopt additional, more stringent requirements in the future.',
393
+ 'What changes are occurring with vehicle safety regulations outside of the U.S.?',
394
+ "What competitive factors influence Chubb International's international operations?",
395
+ ]
396
+ embeddings = model.encode(sentences)
397
+ print(embeddings.shape)
398
+ # [3, 768]
399
+
400
+ # Get the similarity scores for the embeddings
401
+ similarities = model.similarity(embeddings, embeddings)
402
+ print(similarities.shape)
403
+ # [3, 3]
404
+ ```
405
+
406
+ <!--
407
+ ### Direct Usage (Transformers)
408
+
409
+ <details><summary>Click to see the direct usage in Transformers</summary>
410
+
411
+ </details>
412
+ -->
413
+
414
+ <!--
415
+ ### Downstream Usage (Sentence Transformers)
416
+
417
+ You can finetune this model on your own dataset.
418
+
419
+ <details><summary>Click to expand</summary>
420
+
421
+ </details>
422
+ -->
423
+
424
+ <!--
425
+ ### Out-of-Scope Use
426
+
427
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
428
+ -->
429
+
430
+ ## Evaluation
431
+
432
+ ### Metrics
433
+
434
+ #### Information Retrieval
435
+
436
+ * Datasets: `dim_768`, `dim_512`, `dim_256`, `dim_128` and `dim_64`
437
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
438
+
439
+ | Metric | dim_768 | dim_512 | dim_256 | dim_128 | dim_64 |
440
+ |:--------------------|:-----------|:-----------|:-----------|:-----------|:-----------|
441
+ | cosine_accuracy@1 | 0.6886 | 0.6864 | 0.68 | 0.6621 | 0.6286 |
442
+ | cosine_accuracy@3 | 0.8279 | 0.8293 | 0.825 | 0.7964 | 0.7636 |
443
+ | cosine_accuracy@5 | 0.8729 | 0.8729 | 0.8636 | 0.8457 | 0.8057 |
444
+ | cosine_accuracy@10 | 0.9164 | 0.9136 | 0.9043 | 0.8907 | 0.8643 |
445
+ | cosine_precision@1 | 0.6886 | 0.6864 | 0.68 | 0.6621 | 0.6286 |
446
+ | cosine_precision@3 | 0.276 | 0.2764 | 0.275 | 0.2655 | 0.2545 |
447
+ | cosine_precision@5 | 0.1746 | 0.1746 | 0.1727 | 0.1691 | 0.1611 |
448
+ | cosine_precision@10 | 0.0916 | 0.0914 | 0.0904 | 0.0891 | 0.0864 |
449
+ | cosine_recall@1 | 0.6886 | 0.6864 | 0.68 | 0.6621 | 0.6286 |
450
+ | cosine_recall@3 | 0.8279 | 0.8293 | 0.825 | 0.7964 | 0.7636 |
451
+ | cosine_recall@5 | 0.8729 | 0.8729 | 0.8636 | 0.8457 | 0.8057 |
452
+ | cosine_recall@10 | 0.9164 | 0.9136 | 0.9043 | 0.8907 | 0.8643 |
453
+ | **cosine_ndcg@10** | **0.8042** | **0.8024** | **0.7955** | **0.7773** | **0.7447** |
454
+ | cosine_mrr@10 | 0.7682 | 0.7666 | 0.7603 | 0.7409 | 0.7067 |
455
+ | cosine_map@100 | 0.7713 | 0.7697 | 0.7637 | 0.7449 | 0.7112 |
456
+
457
+ <!--
458
+ ## Bias, Risks and Limitations
459
+
460
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
461
+ -->
462
+
463
+ <!--
464
+ ### Recommendations
465
+
466
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
467
+ -->
468
+
469
+ ## Training Details
470
+
471
+ ### Training Dataset
472
+
473
+ #### json
474
+
475
+ * Dataset: json
476
+ * Size: 5,600 training samples
477
+ * Columns: <code>positive</code> and <code>anchor</code>
478
+ * Approximate statistics based on the first 1000 samples:
479
+ | | positive | anchor |
480
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
481
+ | type | string | string |
482
+ | details | <ul><li>min: 4 tokens</li><li>mean: 44.34 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 9 tokens</li><li>mean: 20.46 tokens</li><li>max: 46 tokens</li></ul> |
483
+ * Samples:
484
+ | positive | anchor |
485
+ |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------|
486
+ | <code>Z-net is AutoZone's proprietary electronic catalog and enables AutoZoners to efficiently look up parts that customers need, providing complete job solutions and information based on vehicle specifics. It also tracks inventory availability across different locations.</code> | <code>What is the purpose of Z-net in AutoZone stores?</code> |
487
+ | <code>In 2023, the allowance for loan and lease losses was $13.3 billion on total loans and leases of $1,050.2 billion, which excludes loans accounted for under the fair value option.</code> | <code>What was the total amount of loans and leases at Bank of America by the end of 2023, excluding those accounted for under the fair value option?</code> |
488
+ | <code>We significantly improved features in Service Manager™, which installers can use from their mobile devices to get service instantly. We continue to provide 24/7 support for installers and Enphase system owners globally across our phone, online chat, and email communications channel. We continue to train our customer service agents with a goal of reducing average customer wait times to under one minute, and we continue to expand our network of field service technicians in the United States, Europe and Australia to provide direct homeowner assistance.</code> | <code>What measures has Enphase Energy, Inc. taken to improve customer service in 2023?</code> |
489
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
490
+ ```json
491
+ {
492
+ "loss": "MultipleNegativesRankingLoss",
493
+ "matryoshka_dims": [
494
+ 768,
495
+ 512,
496
+ 256,
497
+ 128,
498
+ 64
499
+ ],
500
+ "matryoshka_weights": [
501
+ 1,
502
+ 1,
503
+ 1,
504
+ 1,
505
+ 1
506
+ ],
507
+ "n_dims_per_step": -1
508
+ }
509
+ ```
510
+
511
+ ### Training Hyperparameters
512
+ #### Non-Default Hyperparameters
513
+
514
+ - `eval_strategy`: epoch
515
+ - `per_device_train_batch_size`: 32
516
+ - `per_device_eval_batch_size`: 16
517
+ - `gradient_accumulation_steps`: 16
518
+ - `learning_rate`: 2e-05
519
+ - `num_train_epochs`: 2
520
+ - `lr_scheduler_type`: cosine
521
+ - `warmup_ratio`: 0.1
522
+ - `bf16`: True
523
+ - `tf32`: True
524
+ - `load_best_model_at_end`: True
525
+ - `optim`: adamw_torch_fused
526
+ - `batch_sampler`: no_duplicates
527
+
528
+ #### All Hyperparameters
529
+ <details><summary>Click to expand</summary>
530
+
531
+ - `overwrite_output_dir`: False
532
+ - `do_predict`: False
533
+ - `eval_strategy`: epoch
534
+ - `prediction_loss_only`: True
535
+ - `per_device_train_batch_size`: 32
536
+ - `per_device_eval_batch_size`: 16
537
+ - `per_gpu_train_batch_size`: None
538
+ - `per_gpu_eval_batch_size`: None
539
+ - `gradient_accumulation_steps`: 16
540
+ - `eval_accumulation_steps`: None
541
+ - `torch_empty_cache_steps`: None
542
+ - `learning_rate`: 2e-05
543
+ - `weight_decay`: 0.0
544
+ - `adam_beta1`: 0.9
545
+ - `adam_beta2`: 0.999
546
+ - `adam_epsilon`: 1e-08
547
+ - `max_grad_norm`: 1.0
548
+ - `num_train_epochs`: 2
549
+ - `max_steps`: -1
550
+ - `lr_scheduler_type`: cosine
551
+ - `lr_scheduler_kwargs`: {}
552
+ - `warmup_ratio`: 0.1
553
+ - `warmup_steps`: 0
554
+ - `log_level`: passive
555
+ - `log_level_replica`: warning
556
+ - `log_on_each_node`: True
557
+ - `logging_nan_inf_filter`: True
558
+ - `save_safetensors`: True
559
+ - `save_on_each_node`: False
560
+ - `save_only_model`: False
561
+ - `restore_callback_states_from_checkpoint`: False
562
+ - `no_cuda`: False
563
+ - `use_cpu`: False
564
+ - `use_mps_device`: False
565
+ - `seed`: 42
566
+ - `data_seed`: None
567
+ - `jit_mode_eval`: False
568
+ - `use_ipex`: False
569
+ - `bf16`: True
570
+ - `fp16`: False
571
+ - `fp16_opt_level`: O1
572
+ - `half_precision_backend`: auto
573
+ - `bf16_full_eval`: False
574
+ - `fp16_full_eval`: False
575
+ - `tf32`: True
576
+ - `local_rank`: 0
577
+ - `ddp_backend`: None
578
+ - `tpu_num_cores`: None
579
+ - `tpu_metrics_debug`: False
580
+ - `debug`: []
581
+ - `dataloader_drop_last`: False
582
+ - `dataloader_num_workers`: 0
583
+ - `dataloader_prefetch_factor`: None
584
+ - `past_index`: -1
585
+ - `disable_tqdm`: False
586
+ - `remove_unused_columns`: True
587
+ - `label_names`: None
588
+ - `load_best_model_at_end`: True
589
+ - `ignore_data_skip`: False
590
+ - `fsdp`: []
591
+ - `fsdp_min_num_params`: 0
592
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
593
+ - `fsdp_transformer_layer_cls_to_wrap`: None
594
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
595
+ - `deepspeed`: None
596
+ - `label_smoothing_factor`: 0.0
597
+ - `optim`: adamw_torch_fused
598
+ - `optim_args`: None
599
+ - `adafactor`: False
600
+ - `group_by_length`: False
601
+ - `length_column_name`: length
602
+ - `ddp_find_unused_parameters`: None
603
+ - `ddp_bucket_cap_mb`: None
604
+ - `ddp_broadcast_buffers`: False
605
+ - `dataloader_pin_memory`: True
606
+ - `dataloader_persistent_workers`: False
607
+ - `skip_memory_metrics`: True
608
+ - `use_legacy_prediction_loop`: False
609
+ - `push_to_hub`: False
610
+ - `resume_from_checkpoint`: None
611
+ - `hub_model_id`: None
612
+ - `hub_strategy`: every_save
613
+ - `hub_private_repo`: False
614
+ - `hub_always_push`: False
615
+ - `gradient_checkpointing`: False
616
+ - `gradient_checkpointing_kwargs`: None
617
+ - `include_inputs_for_metrics`: False
618
+ - `include_for_metrics`: []
619
+ - `eval_do_concat_batches`: True
620
+ - `fp16_backend`: auto
621
+ - `push_to_hub_model_id`: None
622
+ - `push_to_hub_organization`: None
623
+ - `mp_parameters`:
624
+ - `auto_find_batch_size`: False
625
+ - `full_determinism`: False
626
+ - `torchdynamo`: None
627
+ - `ray_scope`: last
628
+ - `ddp_timeout`: 1800
629
+ - `torch_compile`: False
630
+ - `torch_compile_backend`: None
631
+ - `torch_compile_mode`: None
632
+ - `dispatch_batches`: None
633
+ - `split_batches`: None
634
+ - `include_tokens_per_second`: False
635
+ - `include_num_input_tokens_seen`: False
636
+ - `neftune_noise_alpha`: None
637
+ - `optim_target_modules`: None
638
+ - `batch_eval_metrics`: False
639
+ - `eval_on_start`: False
640
+ - `use_liger_kernel`: False
641
+ - `eval_use_gather_object`: False
642
+ - `average_tokens_across_devices`: False
643
+ - `prompts`: None
644
+ - `batch_sampler`: no_duplicates
645
+ - `multi_dataset_batch_sampler`: proportional
646
+
647
+ </details>
648
+
649
+ ### Training Logs
650
+ | Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
651
+ |:----------:|:------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|
652
+ | 0.9143 | 10 | 1.4537 | 0.7992 | 0.7952 | 0.7900 | 0.7703 | 0.7350 |
653
+ | **1.8286** | **20** | **0.6857** | **0.8042** | **0.8024** | **0.7955** | **0.7773** | **0.7447** |
654
+
655
+ * The bold row denotes the saved checkpoint.
656
+
657
+ ### Framework Versions
658
+ - Python: 3.10.12
659
+ - Sentence Transformers: 3.3.0
660
+ - Transformers: 4.46.2
661
+ - PyTorch: 2.5.1+cu124
662
+ - Accelerate: 1.1.1
663
+ - Datasets: 3.1.0
664
+ - Tokenizers: 0.20.3
665
+
666
+ ## Citation
667
+
668
+ ### BibTeX
669
+
670
+ #### Sentence Transformers
671
+ ```bibtex
672
+ @inproceedings{reimers-2019-sentence-bert,
673
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
674
+ author = "Reimers, Nils and Gurevych, Iryna",
675
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
676
+ month = "11",
677
+ year = "2019",
678
+ publisher = "Association for Computational Linguistics",
679
+ url = "https://arxiv.org/abs/1908.10084",
680
+ }
681
+ ```
682
+
683
+ #### MatryoshkaLoss
684
+ ```bibtex
685
+ @misc{kusupati2024matryoshka,
686
+ title={Matryoshka Representation Learning},
687
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
688
+ year={2024},
689
+ eprint={2205.13147},
690
+ archivePrefix={arXiv},
691
+ primaryClass={cs.LG}
692
+ }
693
+ ```
694
+
695
+ #### MultipleNegativesRankingLoss
696
+ ```bibtex
697
+ @misc{henderson2017efficient,
698
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
699
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
700
+ year={2017},
701
+ eprint={1705.00652},
702
+ archivePrefix={arXiv},
703
+ primaryClass={cs.CL}
704
+ }
705
+ ```
706
+
707
+ <!--
708
+ ## Glossary
709
+
710
+ *Clearly define terms in order to be accessible across audiences.*
711
+ -->
712
+
713
+ <!--
714
+ ## Model Card Authors
715
+
716
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
717
+ -->
718
+
719
+ <!--
720
+ ## Model Card Contact
721
+
722
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
723
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.46.2",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.3.0",
4
+ "transformers": "4.46.2",
5
+ "pytorch": "2.5.1+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3cedffab57927f5d7da06a4edcfd07c43b57d253d764e1484a4d1c765692b5e1
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff