Ram934 commited on
Commit
fd1165c
·
verified ·
1 Parent(s): 1aaaca1

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,725 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - generated_from_trainer
10
+ - dataset_size:6300
11
+ - loss:MatryoshkaLoss
12
+ - loss:MultipleNegativesRankingLoss
13
+ base_model: BAAI/bge-base-en-v1.5
14
+ widget:
15
+ - source_sentence: The cumulative basis adjustments associated with these hedging
16
+ relationships are a reduction of the amortized cost basis of the closed portfolios
17
+ of $19 million.
18
+ sentences:
19
+ - What are the main factors that influence the timing and cost of the company's
20
+ inventory purchases?
21
+ - What was the reduction in the amortized cost basis of the closed portfolios due
22
+ to cumulative basis adjustments in these hedging relationships?
23
+ - What was Garmin Ltd.'s net income for the fiscal year ended December 30, 2023?
24
+ - source_sentence: 'The components of the provision for income taxes were as follows:
25
+ U.S. Federal $ (314,757), U.S. State and Local $ (85,355), Foreign $ (1,162).
26
+ Effective income tax rate | 24.2% | | 23.9% | | ''19.7% | for the years 2021,
27
+ 2022, and 2023.'
28
+ sentences:
29
+ - How much of the lease obligations is payable within 12 months as of December 31,
30
+ 2023?
31
+ - What are the components and the effective tax rates for the year 2023 as reported
32
+ in the financial statements?
33
+ - How many Dollar Tree Plus stores were there as of January 28, 2023?
34
+ - source_sentence: The Company may receive advanced royalty payments from licensees,
35
+ either in advance of a licensee’s subsequent sales to customers or, prior to the
36
+ completion of the Company’s performance obligation. The Wizards of the Coast and
37
+ Digital Gaming segment may also receive advanced payments from end users of its
38
+ digital games at the time of the initial purchase, through in-application purchases,
39
+ or through subscription services. Revenues on all licensee and digital gaming
40
+ advanced payments are deferred until the respective performance obligations are
41
+ satisfied, and these digital gaming revenues are recognized over a period of time,
42
+ determined based on either player usage patterns or the estimated playing life
43
+ of the user, or when additional downloadable content is made available, or as
44
+ with subscription services, ratably over the subscription term.
45
+ sentences:
46
+ - How does the Company recognize revenue from advanced royalty payments and digital
47
+ game purchases?
48
+ - What is the primary role of Canopy technology in the Health Services segment?
49
+ - Which section of a financial document provides an index to Financial Statements
50
+ and Supplementary Data?
51
+ - source_sentence: Item 8 covers Financial Statements and Supplementary Data.
52
+ sentences:
53
+ - How much did the prepaid expenses increase from 2022 to 2023?
54
+ - What strategies are outlined in the Company's human capital management?
55
+ - What type of data does Item 8 cover in the company's filing?
56
+ - source_sentence: When points are issued as a result of a stay by a Hilton Honors
57
+ member at an owned or leased hotel, we recognize a reduction in owned and leased
58
+ hotels revenues, since we are also the program sponsor.
59
+ sentences:
60
+ - What financial impact does the redemption of Hilton Honors points have on the
61
+ revenue of owned and leased hotels?
62
+ - What original companies formed IBM in 1911?
63
+ - What was the global gender equity status at Meta in July 2023?
64
+ pipeline_tag: sentence-similarity
65
+ library_name: sentence-transformers
66
+ metrics:
67
+ - cosine_accuracy@1
68
+ - cosine_accuracy@3
69
+ - cosine_accuracy@5
70
+ - cosine_accuracy@10
71
+ - cosine_precision@1
72
+ - cosine_precision@3
73
+ - cosine_precision@5
74
+ - cosine_precision@10
75
+ - cosine_recall@1
76
+ - cosine_recall@3
77
+ - cosine_recall@5
78
+ - cosine_recall@10
79
+ - cosine_ndcg@10
80
+ - cosine_mrr@10
81
+ - cosine_map@100
82
+ model-index:
83
+ - name: BGE base Financial Matryoshka
84
+ results:
85
+ - task:
86
+ type: information-retrieval
87
+ name: Information Retrieval
88
+ dataset:
89
+ name: dim 768
90
+ type: dim_768
91
+ metrics:
92
+ - type: cosine_accuracy@1
93
+ value: 0.6714285714285714
94
+ name: Cosine Accuracy@1
95
+ - type: cosine_accuracy@3
96
+ value: 0.8114285714285714
97
+ name: Cosine Accuracy@3
98
+ - type: cosine_accuracy@5
99
+ value: 0.8485714285714285
100
+ name: Cosine Accuracy@5
101
+ - type: cosine_accuracy@10
102
+ value: 0.9
103
+ name: Cosine Accuracy@10
104
+ - type: cosine_precision@1
105
+ value: 0.6714285714285714
106
+ name: Cosine Precision@1
107
+ - type: cosine_precision@3
108
+ value: 0.2704761904761904
109
+ name: Cosine Precision@3
110
+ - type: cosine_precision@5
111
+ value: 0.16971428571428568
112
+ name: Cosine Precision@5
113
+ - type: cosine_precision@10
114
+ value: 0.09
115
+ name: Cosine Precision@10
116
+ - type: cosine_recall@1
117
+ value: 0.6714285714285714
118
+ name: Cosine Recall@1
119
+ - type: cosine_recall@3
120
+ value: 0.8114285714285714
121
+ name: Cosine Recall@3
122
+ - type: cosine_recall@5
123
+ value: 0.8485714285714285
124
+ name: Cosine Recall@5
125
+ - type: cosine_recall@10
126
+ value: 0.9
127
+ name: Cosine Recall@10
128
+ - type: cosine_ndcg@10
129
+ value: 0.7869239024966277
130
+ name: Cosine Ndcg@10
131
+ - type: cosine_mrr@10
132
+ value: 0.7507120181405897
133
+ name: Cosine Mrr@10
134
+ - type: cosine_map@100
135
+ value: 0.7550416257512982
136
+ name: Cosine Map@100
137
+ - task:
138
+ type: information-retrieval
139
+ name: Information Retrieval
140
+ dataset:
141
+ name: dim 512
142
+ type: dim_512
143
+ metrics:
144
+ - type: cosine_accuracy@1
145
+ value: 0.6657142857142857
146
+ name: Cosine Accuracy@1
147
+ - type: cosine_accuracy@3
148
+ value: 0.81
149
+ name: Cosine Accuracy@3
150
+ - type: cosine_accuracy@5
151
+ value: 0.8542857142857143
152
+ name: Cosine Accuracy@5
153
+ - type: cosine_accuracy@10
154
+ value: 0.8928571428571429
155
+ name: Cosine Accuracy@10
156
+ - type: cosine_precision@1
157
+ value: 0.6657142857142857
158
+ name: Cosine Precision@1
159
+ - type: cosine_precision@3
160
+ value: 0.27
161
+ name: Cosine Precision@3
162
+ - type: cosine_precision@5
163
+ value: 0.17085714285714285
164
+ name: Cosine Precision@5
165
+ - type: cosine_precision@10
166
+ value: 0.08928571428571426
167
+ name: Cosine Precision@10
168
+ - type: cosine_recall@1
169
+ value: 0.6657142857142857
170
+ name: Cosine Recall@1
171
+ - type: cosine_recall@3
172
+ value: 0.81
173
+ name: Cosine Recall@3
174
+ - type: cosine_recall@5
175
+ value: 0.8542857142857143
176
+ name: Cosine Recall@5
177
+ - type: cosine_recall@10
178
+ value: 0.8928571428571429
179
+ name: Cosine Recall@10
180
+ - type: cosine_ndcg@10
181
+ value: 0.7812019485050782
182
+ name: Cosine Ndcg@10
183
+ - type: cosine_mrr@10
184
+ value: 0.7451230158730157
185
+ name: Cosine Mrr@10
186
+ - type: cosine_map@100
187
+ value: 0.7500357971583163
188
+ name: Cosine Map@100
189
+ - task:
190
+ type: information-retrieval
191
+ name: Information Retrieval
192
+ dataset:
193
+ name: dim 256
194
+ type: dim_256
195
+ metrics:
196
+ - type: cosine_accuracy@1
197
+ value: 0.6628571428571428
198
+ name: Cosine Accuracy@1
199
+ - type: cosine_accuracy@3
200
+ value: 0.7928571428571428
201
+ name: Cosine Accuracy@3
202
+ - type: cosine_accuracy@5
203
+ value: 0.8428571428571429
204
+ name: Cosine Accuracy@5
205
+ - type: cosine_accuracy@10
206
+ value: 0.8842857142857142
207
+ name: Cosine Accuracy@10
208
+ - type: cosine_precision@1
209
+ value: 0.6628571428571428
210
+ name: Cosine Precision@1
211
+ - type: cosine_precision@3
212
+ value: 0.2642857142857143
213
+ name: Cosine Precision@3
214
+ - type: cosine_precision@5
215
+ value: 0.16857142857142854
216
+ name: Cosine Precision@5
217
+ - type: cosine_precision@10
218
+ value: 0.08842857142857141
219
+ name: Cosine Precision@10
220
+ - type: cosine_recall@1
221
+ value: 0.6628571428571428
222
+ name: Cosine Recall@1
223
+ - type: cosine_recall@3
224
+ value: 0.7928571428571428
225
+ name: Cosine Recall@3
226
+ - type: cosine_recall@5
227
+ value: 0.8428571428571429
228
+ name: Cosine Recall@5
229
+ - type: cosine_recall@10
230
+ value: 0.8842857142857142
231
+ name: Cosine Recall@10
232
+ - type: cosine_ndcg@10
233
+ value: 0.7743199196082401
234
+ name: Cosine Ndcg@10
235
+ - type: cosine_mrr@10
236
+ value: 0.7389903628117913
237
+ name: Cosine Mrr@10
238
+ - type: cosine_map@100
239
+ value: 0.7442531468911058
240
+ name: Cosine Map@100
241
+ - task:
242
+ type: information-retrieval
243
+ name: Information Retrieval
244
+ dataset:
245
+ name: dim 128
246
+ type: dim_128
247
+ metrics:
248
+ - type: cosine_accuracy@1
249
+ value: 0.6671428571428571
250
+ name: Cosine Accuracy@1
251
+ - type: cosine_accuracy@3
252
+ value: 0.77
253
+ name: Cosine Accuracy@3
254
+ - type: cosine_accuracy@5
255
+ value: 0.8228571428571428
256
+ name: Cosine Accuracy@5
257
+ - type: cosine_accuracy@10
258
+ value: 0.8685714285714285
259
+ name: Cosine Accuracy@10
260
+ - type: cosine_precision@1
261
+ value: 0.6671428571428571
262
+ name: Cosine Precision@1
263
+ - type: cosine_precision@3
264
+ value: 0.25666666666666665
265
+ name: Cosine Precision@3
266
+ - type: cosine_precision@5
267
+ value: 0.16457142857142856
268
+ name: Cosine Precision@5
269
+ - type: cosine_precision@10
270
+ value: 0.08685714285714285
271
+ name: Cosine Precision@10
272
+ - type: cosine_recall@1
273
+ value: 0.6671428571428571
274
+ name: Cosine Recall@1
275
+ - type: cosine_recall@3
276
+ value: 0.77
277
+ name: Cosine Recall@3
278
+ - type: cosine_recall@5
279
+ value: 0.8228571428571428
280
+ name: Cosine Recall@5
281
+ - type: cosine_recall@10
282
+ value: 0.8685714285714285
283
+ name: Cosine Recall@10
284
+ - type: cosine_ndcg@10
285
+ value: 0.7655373626539865
286
+ name: Cosine Ndcg@10
287
+ - type: cosine_mrr@10
288
+ value: 0.7328270975056688
289
+ name: Cosine Mrr@10
290
+ - type: cosine_map@100
291
+ value: 0.7378874490017019
292
+ name: Cosine Map@100
293
+ - task:
294
+ type: information-retrieval
295
+ name: Information Retrieval
296
+ dataset:
297
+ name: dim 64
298
+ type: dim_64
299
+ metrics:
300
+ - type: cosine_accuracy@1
301
+ value: 0.6285714285714286
302
+ name: Cosine Accuracy@1
303
+ - type: cosine_accuracy@3
304
+ value: 0.75
305
+ name: Cosine Accuracy@3
306
+ - type: cosine_accuracy@5
307
+ value: 0.7842857142857143
308
+ name: Cosine Accuracy@5
309
+ - type: cosine_accuracy@10
310
+ value: 0.8285714285714286
311
+ name: Cosine Accuracy@10
312
+ - type: cosine_precision@1
313
+ value: 0.6285714285714286
314
+ name: Cosine Precision@1
315
+ - type: cosine_precision@3
316
+ value: 0.25
317
+ name: Cosine Precision@3
318
+ - type: cosine_precision@5
319
+ value: 0.15685714285714283
320
+ name: Cosine Precision@5
321
+ - type: cosine_precision@10
322
+ value: 0.08285714285714285
323
+ name: Cosine Precision@10
324
+ - type: cosine_recall@1
325
+ value: 0.6285714285714286
326
+ name: Cosine Recall@1
327
+ - type: cosine_recall@3
328
+ value: 0.75
329
+ name: Cosine Recall@3
330
+ - type: cosine_recall@5
331
+ value: 0.7842857142857143
332
+ name: Cosine Recall@5
333
+ - type: cosine_recall@10
334
+ value: 0.8285714285714286
335
+ name: Cosine Recall@10
336
+ - type: cosine_ndcg@10
337
+ value: 0.7300345502506145
338
+ name: Cosine Ndcg@10
339
+ - type: cosine_mrr@10
340
+ value: 0.6984109977324261
341
+ name: Cosine Mrr@10
342
+ - type: cosine_map@100
343
+ value: 0.7040560866496234
344
+ name: Cosine Map@100
345
+ ---
346
+
347
+ # BGE base Financial Matryoshka
348
+
349
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
350
+
351
+ ## Model Details
352
+
353
+ ### Model Description
354
+ - **Model Type:** Sentence Transformer
355
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
356
+ - **Maximum Sequence Length:** 512 tokens
357
+ - **Output Dimensionality:** 768 dimensions
358
+ - **Similarity Function:** Cosine Similarity
359
+ - **Training Dataset:**
360
+ - json
361
+ - **Language:** en
362
+ - **License:** apache-2.0
363
+
364
+ ### Model Sources
365
+
366
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
367
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
368
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
369
+
370
+ ### Full Model Architecture
371
+
372
+ ```
373
+ SentenceTransformer(
374
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
375
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
376
+ (2): Normalize()
377
+ )
378
+ ```
379
+
380
+ ## Usage
381
+
382
+ ### Direct Usage (Sentence Transformers)
383
+
384
+ First install the Sentence Transformers library:
385
+
386
+ ```bash
387
+ pip install -U sentence-transformers
388
+ ```
389
+
390
+ Then you can load this model and run inference.
391
+ ```python
392
+ from sentence_transformers import SentenceTransformer
393
+
394
+ # Download from the 🤗 Hub
395
+ model = SentenceTransformer("Ram934/bge-base-financial-matryoshka")
396
+ # Run inference
397
+ sentences = [
398
+ 'When points are issued as a result of a stay by a Hilton Honors member at an owned or leased hotel, we recognize a reduction in owned and leased hotels revenues, since we are also the program sponsor.',
399
+ 'What financial impact does the redemption of Hilton Honors points have on the revenue of owned and leased hotels?',
400
+ 'What original companies formed IBM in 1911?',
401
+ ]
402
+ embeddings = model.encode(sentences)
403
+ print(embeddings.shape)
404
+ # [3, 768]
405
+
406
+ # Get the similarity scores for the embeddings
407
+ similarities = model.similarity(embeddings, embeddings)
408
+ print(similarities.shape)
409
+ # [3, 3]
410
+ ```
411
+
412
+ <!--
413
+ ### Direct Usage (Transformers)
414
+
415
+ <details><summary>Click to see the direct usage in Transformers</summary>
416
+
417
+ </details>
418
+ -->
419
+
420
+ <!--
421
+ ### Downstream Usage (Sentence Transformers)
422
+
423
+ You can finetune this model on your own dataset.
424
+
425
+ <details><summary>Click to expand</summary>
426
+
427
+ </details>
428
+ -->
429
+
430
+ <!--
431
+ ### Out-of-Scope Use
432
+
433
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
434
+ -->
435
+
436
+ ## Evaluation
437
+
438
+ ### Metrics
439
+
440
+ #### Information Retrieval
441
+
442
+ * Datasets: `dim_768`, `dim_512`, `dim_256`, `dim_128` and `dim_64`
443
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
444
+
445
+ | Metric | dim_768 | dim_512 | dim_256 | dim_128 | dim_64 |
446
+ |:--------------------|:-----------|:-----------|:-----------|:-----------|:---------|
447
+ | cosine_accuracy@1 | 0.6714 | 0.6657 | 0.6629 | 0.6671 | 0.6286 |
448
+ | cosine_accuracy@3 | 0.8114 | 0.81 | 0.7929 | 0.77 | 0.75 |
449
+ | cosine_accuracy@5 | 0.8486 | 0.8543 | 0.8429 | 0.8229 | 0.7843 |
450
+ | cosine_accuracy@10 | 0.9 | 0.8929 | 0.8843 | 0.8686 | 0.8286 |
451
+ | cosine_precision@1 | 0.6714 | 0.6657 | 0.6629 | 0.6671 | 0.6286 |
452
+ | cosine_precision@3 | 0.2705 | 0.27 | 0.2643 | 0.2567 | 0.25 |
453
+ | cosine_precision@5 | 0.1697 | 0.1709 | 0.1686 | 0.1646 | 0.1569 |
454
+ | cosine_precision@10 | 0.09 | 0.0893 | 0.0884 | 0.0869 | 0.0829 |
455
+ | cosine_recall@1 | 0.6714 | 0.6657 | 0.6629 | 0.6671 | 0.6286 |
456
+ | cosine_recall@3 | 0.8114 | 0.81 | 0.7929 | 0.77 | 0.75 |
457
+ | cosine_recall@5 | 0.8486 | 0.8543 | 0.8429 | 0.8229 | 0.7843 |
458
+ | cosine_recall@10 | 0.9 | 0.8929 | 0.8843 | 0.8686 | 0.8286 |
459
+ | **cosine_ndcg@10** | **0.7869** | **0.7812** | **0.7743** | **0.7655** | **0.73** |
460
+ | cosine_mrr@10 | 0.7507 | 0.7451 | 0.739 | 0.7328 | 0.6984 |
461
+ | cosine_map@100 | 0.755 | 0.75 | 0.7443 | 0.7379 | 0.7041 |
462
+
463
+ <!--
464
+ ## Bias, Risks and Limitations
465
+
466
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
467
+ -->
468
+
469
+ <!--
470
+ ### Recommendations
471
+
472
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
473
+ -->
474
+
475
+ ## Training Details
476
+
477
+ ### Training Dataset
478
+
479
+ #### json
480
+
481
+ * Dataset: json
482
+ * Size: 6,300 training samples
483
+ * Columns: <code>positive</code> and <code>anchor</code>
484
+ * Approximate statistics based on the first 1000 samples:
485
+ | | positive | anchor |
486
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
487
+ | type | string | string |
488
+ | details | <ul><li>min: 9 tokens</li><li>mean: 46.56 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 20.58 tokens</li><li>max: 51 tokens</li></ul> |
489
+ * Samples:
490
+ | positive | anchor |
491
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------|
492
+ | <code>All of our Company’s facilities and other operations in the United States and elsewhere around the world are subject to various environmental protection statutes and regulations, including those relating to the use and treatment of water resources, discharge of wastewater, and air emissions.</code> | <code>What types of environmental regulations does the company need to comply with?</code> |
493
+ | <code>Domestically, diesel fuel prices were higher in fiscal 2022 than in the prior year and may increase further in fiscal 2023 because of international tensions.</code> | <code>How did diesel fuel prices affect the company’s freight costs in fiscal 2022?</code> |
494
+ | <code>Our common stock trades on the NASDAQ Global Select Market, under the symbol “COST.”</code> | <code>What is the trading symbol for Costco's common stock on the NASDAQ Global Select Market?</code> |
495
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
496
+ ```json
497
+ {
498
+ "loss": "MultipleNegativesRankingLoss",
499
+ "matryoshka_dims": [
500
+ 768,
501
+ 512,
502
+ 256,
503
+ 128,
504
+ 64
505
+ ],
506
+ "matryoshka_weights": [
507
+ 1,
508
+ 1,
509
+ 1,
510
+ 1,
511
+ 1
512
+ ],
513
+ "n_dims_per_step": -1
514
+ }
515
+ ```
516
+
517
+ ### Training Hyperparameters
518
+ #### Non-Default Hyperparameters
519
+
520
+ - `eval_strategy`: epoch
521
+ - `per_device_train_batch_size`: 32
522
+ - `per_device_eval_batch_size`: 16
523
+ - `gradient_accumulation_steps`: 16
524
+ - `learning_rate`: 2e-05
525
+ - `num_train_epochs`: 4
526
+ - `lr_scheduler_type`: cosine
527
+ - `warmup_ratio`: 0.1
528
+ - `tf32`: False
529
+ - `load_best_model_at_end`: True
530
+ - `optim`: adamw_torch_fused
531
+ - `batch_sampler`: no_duplicates
532
+
533
+ #### All Hyperparameters
534
+ <details><summary>Click to expand</summary>
535
+
536
+ - `overwrite_output_dir`: False
537
+ - `do_predict`: False
538
+ - `eval_strategy`: epoch
539
+ - `prediction_loss_only`: True
540
+ - `per_device_train_batch_size`: 32
541
+ - `per_device_eval_batch_size`: 16
542
+ - `per_gpu_train_batch_size`: None
543
+ - `per_gpu_eval_batch_size`: None
544
+ - `gradient_accumulation_steps`: 16
545
+ - `eval_accumulation_steps`: None
546
+ - `learning_rate`: 2e-05
547
+ - `weight_decay`: 0.0
548
+ - `adam_beta1`: 0.9
549
+ - `adam_beta2`: 0.999
550
+ - `adam_epsilon`: 1e-08
551
+ - `max_grad_norm`: 1.0
552
+ - `num_train_epochs`: 4
553
+ - `max_steps`: -1
554
+ - `lr_scheduler_type`: cosine
555
+ - `lr_scheduler_kwargs`: {}
556
+ - `warmup_ratio`: 0.1
557
+ - `warmup_steps`: 0
558
+ - `log_level`: passive
559
+ - `log_level_replica`: warning
560
+ - `log_on_each_node`: True
561
+ - `logging_nan_inf_filter`: True
562
+ - `save_safetensors`: True
563
+ - `save_on_each_node`: False
564
+ - `save_only_model`: False
565
+ - `restore_callback_states_from_checkpoint`: False
566
+ - `no_cuda`: False
567
+ - `use_cpu`: False
568
+ - `use_mps_device`: False
569
+ - `seed`: 42
570
+ - `data_seed`: None
571
+ - `jit_mode_eval`: False
572
+ - `use_ipex`: False
573
+ - `bf16`: False
574
+ - `fp16`: False
575
+ - `fp16_opt_level`: O1
576
+ - `half_precision_backend`: auto
577
+ - `bf16_full_eval`: False
578
+ - `fp16_full_eval`: False
579
+ - `tf32`: False
580
+ - `local_rank`: 0
581
+ - `ddp_backend`: None
582
+ - `tpu_num_cores`: None
583
+ - `tpu_metrics_debug`: False
584
+ - `debug`: []
585
+ - `dataloader_drop_last`: False
586
+ - `dataloader_num_workers`: 0
587
+ - `dataloader_prefetch_factor`: None
588
+ - `past_index`: -1
589
+ - `disable_tqdm`: False
590
+ - `remove_unused_columns`: True
591
+ - `label_names`: None
592
+ - `load_best_model_at_end`: True
593
+ - `ignore_data_skip`: False
594
+ - `fsdp`: []
595
+ - `fsdp_min_num_params`: 0
596
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
597
+ - `fsdp_transformer_layer_cls_to_wrap`: None
598
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
599
+ - `deepspeed`: None
600
+ - `label_smoothing_factor`: 0.0
601
+ - `optim`: adamw_torch_fused
602
+ - `optim_args`: None
603
+ - `adafactor`: False
604
+ - `group_by_length`: False
605
+ - `length_column_name`: length
606
+ - `ddp_find_unused_parameters`: None
607
+ - `ddp_bucket_cap_mb`: None
608
+ - `ddp_broadcast_buffers`: False
609
+ - `dataloader_pin_memory`: True
610
+ - `dataloader_persistent_workers`: False
611
+ - `skip_memory_metrics`: True
612
+ - `use_legacy_prediction_loop`: False
613
+ - `push_to_hub`: False
614
+ - `resume_from_checkpoint`: None
615
+ - `hub_model_id`: None
616
+ - `hub_strategy`: every_save
617
+ - `hub_private_repo`: False
618
+ - `hub_always_push`: False
619
+ - `gradient_checkpointing`: False
620
+ - `gradient_checkpointing_kwargs`: None
621
+ - `include_inputs_for_metrics`: False
622
+ - `eval_do_concat_batches`: True
623
+ - `fp16_backend`: auto
624
+ - `push_to_hub_model_id`: None
625
+ - `push_to_hub_organization`: None
626
+ - `mp_parameters`:
627
+ - `auto_find_batch_size`: False
628
+ - `full_determinism`: False
629
+ - `torchdynamo`: None
630
+ - `ray_scope`: last
631
+ - `ddp_timeout`: 1800
632
+ - `torch_compile`: False
633
+ - `torch_compile_backend`: None
634
+ - `torch_compile_mode`: None
635
+ - `dispatch_batches`: None
636
+ - `split_batches`: None
637
+ - `include_tokens_per_second`: False
638
+ - `include_num_input_tokens_seen`: False
639
+ - `neftune_noise_alpha`: None
640
+ - `optim_target_modules`: None
641
+ - `batch_eval_metrics`: False
642
+ - `prompts`: None
643
+ - `batch_sampler`: no_duplicates
644
+ - `multi_dataset_batch_sampler`: proportional
645
+
646
+ </details>
647
+
648
+ ### Training Logs
649
+ | Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
650
+ |:--------:|:------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|
651
+ | 0.96 | 3 | - | 0.7681 | 0.7635 | 0.7543 | 0.7381 | 0.6883 |
652
+ | 1.92 | 6 | - | 0.7812 | 0.7747 | 0.7706 | 0.7602 | 0.7197 |
653
+ | 2.88 | 9 | - | 0.7848 | 0.7806 | 0.7744 | 0.7635 | 0.7286 |
654
+ | 3.2 | 10 | 3.2955 | - | - | - | - | - |
655
+ | **3.84** | **12** | **-** | **0.7869** | **0.7812** | **0.7743** | **0.7655** | **0.73** |
656
+
657
+ * The bold row denotes the saved checkpoint.
658
+
659
+ ### Framework Versions
660
+ - Python: 3.10.14
661
+ - Sentence Transformers: 3.3.1
662
+ - Transformers: 4.41.2
663
+ - PyTorch: 2.4.1+cu121
664
+ - Accelerate: 1.1.1
665
+ - Datasets: 2.19.1
666
+ - Tokenizers: 0.19.1
667
+
668
+ ## Citation
669
+
670
+ ### BibTeX
671
+
672
+ #### Sentence Transformers
673
+ ```bibtex
674
+ @inproceedings{reimers-2019-sentence-bert,
675
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
676
+ author = "Reimers, Nils and Gurevych, Iryna",
677
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
678
+ month = "11",
679
+ year = "2019",
680
+ publisher = "Association for Computational Linguistics",
681
+ url = "https://arxiv.org/abs/1908.10084",
682
+ }
683
+ ```
684
+
685
+ #### MatryoshkaLoss
686
+ ```bibtex
687
+ @misc{kusupati2024matryoshka,
688
+ title={Matryoshka Representation Learning},
689
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
690
+ year={2024},
691
+ eprint={2205.13147},
692
+ archivePrefix={arXiv},
693
+ primaryClass={cs.LG}
694
+ }
695
+ ```
696
+
697
+ #### MultipleNegativesRankingLoss
698
+ ```bibtex
699
+ @misc{henderson2017efficient,
700
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
701
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
702
+ year={2017},
703
+ eprint={1705.00652},
704
+ archivePrefix={arXiv},
705
+ primaryClass={cs.CL}
706
+ }
707
+ ```
708
+
709
+ <!--
710
+ ## Glossary
711
+
712
+ *Clearly define terms in order to be accessible across audiences.*
713
+ -->
714
+
715
+ <!--
716
+ ## Model Card Authors
717
+
718
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
719
+ -->
720
+
721
+ <!--
722
+ ## Model Card Contact
723
+
724
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
725
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.41.2",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.3.1",
4
+ "transformers": "4.41.2",
5
+ "pytorch": "2.4.1+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7ceb4f236b27c3fde1e26fdae17df3f72174a2b64674befebd2b53da1d12224e
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff