kperkins411 commited on
Commit
9153c44
1 Parent(s): 5141546

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,815 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: BAAI/bge-base-en-v1.5
3
+ datasets: []
4
+ language:
5
+ - en
6
+ library_name: sentence-transformers
7
+ license: apache-2.0
8
+ metrics:
9
+ - cosine_accuracy@1
10
+ - cosine_accuracy@3
11
+ - cosine_accuracy@5
12
+ - cosine_accuracy@10
13
+ - cosine_precision@1
14
+ - cosine_precision@3
15
+ - cosine_precision@5
16
+ - cosine_precision@10
17
+ - cosine_recall@1
18
+ - cosine_recall@3
19
+ - cosine_recall@5
20
+ - cosine_recall@10
21
+ - cosine_ndcg@10
22
+ - cosine_mrr@10
23
+ - cosine_map@100
24
+ pipeline_tag: sentence-similarity
25
+ tags:
26
+ - sentence-transformers
27
+ - sentence-similarity
28
+ - feature-extraction
29
+ - generated_from_trainer
30
+ - dataset_size:6300
31
+ - loss:MatryoshkaLoss
32
+ - loss:MultipleNegativesRankingLoss
33
+ widget:
34
+ - source_sentence: We expect ME&T’s capital expenditures in 2024 to be around $2.0
35
+ billion to $2.5 billion.
36
+ sentences:
37
+ - What was the amount gained from the disposal of assets in 2022?
38
+ - What is the expected capital expenditure for ME&T in 2024?
39
+ - What is the expected total cost HP will incur from its Fiscal 2023 Plan, and how
40
+ is it primarily divided?
41
+ - source_sentence: Average invested capital is calculated as the sum of (i) the average
42
+ of our total assets, (ii) the average LIFO reserve and (iii) the average accumulated
43
+ depreciation and amortization; minus (i) the average taxes receivable, (ii) the
44
+ average trade accounts payable, (iii) the average accrued salaries and wages and
45
+ (iv) the average other current liabilities, excluding accrued income taxes.
46
+ sentences:
47
+ - What are the components and the effective tax rates for the year 2023 as reported
48
+ in the financial statements?
49
+ - How is average invested capital calculated for ROIC?
50
+ - How did the interest income change in fiscal year 2023 compared to the previous
51
+ year?
52
+ - source_sentence: Return on Invested Capital ('ROIC') as of May 31, 2023 was 31.5%
53
+ compared to 46.5% as of May 31, 2022.
54
+ sentences:
55
+ - How is NIKE's return on invested capital (ROIC) calculated, and what was its value
56
+ as of May 31, 2023?
57
+ - What role do medical directors play at outpatient dialysis centers, and what are
58
+ their general qualifications?
59
+ - What item number discusses legal proceedings in the report?
60
+ - source_sentence: Net cash used in financing activities was $506.5 million in the
61
+ year ended December 31, 2022, and increased to $656.5 million in the year ended
62
+ December 31, 2023.
63
+ sentences:
64
+ - How has the change in foreign exchange rates affected cash and cash equivalents
65
+ in 2023 and 2021?
66
+ - What kind of financial documents are included in Part IV, Item 15(a)(1) of the
67
+ Annual Report on Form 10-K?
68
+ - How did the net cash used in financing activities in 2023 compare to 2022?
69
+ - source_sentence: 'Alternative Payments Providers: These providers, such as closed
70
+ commerce ecosystems, BNPL solutions and cryptocurrency platforms, often have a
71
+ primary focus of enabling payments through ecommerce and mobile channels; however,
72
+ they are expanding or may expand their offerings to the physical point of sale.
73
+ These companies may process payments using in-house account transfers between
74
+ parties, electronic funds transfer networks like the ACH, global or local networks
75
+ like Visa, or some combination of the foregoing.'
76
+ sentences:
77
+ - What are some examples of alternative payments providers and how do they compete
78
+ with Visa?
79
+ - How much did the company's currently payable U.S. taxes amount to in 2023?
80
+ - What considerations are involved in recording an uncertain tax position?
81
+ model-index:
82
+ - name: BGE base Financial Matryoshka
83
+ results:
84
+ - task:
85
+ type: information-retrieval
86
+ name: Information Retrieval
87
+ dataset:
88
+ name: dim 768
89
+ type: dim_768
90
+ metrics:
91
+ - type: cosine_accuracy@1
92
+ value: 0.6885714285714286
93
+ name: Cosine Accuracy@1
94
+ - type: cosine_accuracy@3
95
+ value: 0.8328571428571429
96
+ name: Cosine Accuracy@3
97
+ - type: cosine_accuracy@5
98
+ value: 0.8742857142857143
99
+ name: Cosine Accuracy@5
100
+ - type: cosine_accuracy@10
101
+ value: 0.9142857142857143
102
+ name: Cosine Accuracy@10
103
+ - type: cosine_precision@1
104
+ value: 0.6885714285714286
105
+ name: Cosine Precision@1
106
+ - type: cosine_precision@3
107
+ value: 0.2776190476190476
108
+ name: Cosine Precision@3
109
+ - type: cosine_precision@5
110
+ value: 0.17485714285714282
111
+ name: Cosine Precision@5
112
+ - type: cosine_precision@10
113
+ value: 0.09142857142857141
114
+ name: Cosine Precision@10
115
+ - type: cosine_recall@1
116
+ value: 0.6885714285714286
117
+ name: Cosine Recall@1
118
+ - type: cosine_recall@3
119
+ value: 0.8328571428571429
120
+ name: Cosine Recall@3
121
+ - type: cosine_recall@5
122
+ value: 0.8742857142857143
123
+ name: Cosine Recall@5
124
+ - type: cosine_recall@10
125
+ value: 0.9142857142857143
126
+ name: Cosine Recall@10
127
+ - type: cosine_ndcg@10
128
+ value: 0.8044897381040067
129
+ name: Cosine Ndcg@10
130
+ - type: cosine_mrr@10
131
+ value: 0.7690017006802718
132
+ name: Cosine Mrr@10
133
+ - type: cosine_map@100
134
+ value: 0.772240177124622
135
+ name: Cosine Map@100
136
+ - task:
137
+ type: information-retrieval
138
+ name: Information Retrieval
139
+ dataset:
140
+ name: dim 512
141
+ type: dim_512
142
+ metrics:
143
+ - type: cosine_accuracy@1
144
+ value: 0.6971428571428572
145
+ name: Cosine Accuracy@1
146
+ - type: cosine_accuracy@3
147
+ value: 0.8342857142857143
148
+ name: Cosine Accuracy@3
149
+ - type: cosine_accuracy@5
150
+ value: 0.8742857142857143
151
+ name: Cosine Accuracy@5
152
+ - type: cosine_accuracy@10
153
+ value: 0.9071428571428571
154
+ name: Cosine Accuracy@10
155
+ - type: cosine_precision@1
156
+ value: 0.6971428571428572
157
+ name: Cosine Precision@1
158
+ - type: cosine_precision@3
159
+ value: 0.27809523809523806
160
+ name: Cosine Precision@3
161
+ - type: cosine_precision@5
162
+ value: 0.17485714285714282
163
+ name: Cosine Precision@5
164
+ - type: cosine_precision@10
165
+ value: 0.09071428571428569
166
+ name: Cosine Precision@10
167
+ - type: cosine_recall@1
168
+ value: 0.6971428571428572
169
+ name: Cosine Recall@1
170
+ - type: cosine_recall@3
171
+ value: 0.8342857142857143
172
+ name: Cosine Recall@3
173
+ - type: cosine_recall@5
174
+ value: 0.8742857142857143
175
+ name: Cosine Recall@5
176
+ - type: cosine_recall@10
177
+ value: 0.9071428571428571
178
+ name: Cosine Recall@10
179
+ - type: cosine_ndcg@10
180
+ value: 0.8044496489287004
181
+ name: Cosine Ndcg@10
182
+ - type: cosine_mrr@10
183
+ value: 0.7712602040816322
184
+ name: Cosine Mrr@10
185
+ - type: cosine_map@100
186
+ value: 0.7750129601859859
187
+ name: Cosine Map@100
188
+ - task:
189
+ type: information-retrieval
190
+ name: Information Retrieval
191
+ dataset:
192
+ name: dim 256
193
+ type: dim_256
194
+ metrics:
195
+ - type: cosine_accuracy@1
196
+ value: 0.6914285714285714
197
+ name: Cosine Accuracy@1
198
+ - type: cosine_accuracy@3
199
+ value: 0.8257142857142857
200
+ name: Cosine Accuracy@3
201
+ - type: cosine_accuracy@5
202
+ value: 0.8714285714285714
203
+ name: Cosine Accuracy@5
204
+ - type: cosine_accuracy@10
205
+ value: 0.91
206
+ name: Cosine Accuracy@10
207
+ - type: cosine_precision@1
208
+ value: 0.6914285714285714
209
+ name: Cosine Precision@1
210
+ - type: cosine_precision@3
211
+ value: 0.2752380952380953
212
+ name: Cosine Precision@3
213
+ - type: cosine_precision@5
214
+ value: 0.17428571428571427
215
+ name: Cosine Precision@5
216
+ - type: cosine_precision@10
217
+ value: 0.09099999999999998
218
+ name: Cosine Precision@10
219
+ - type: cosine_recall@1
220
+ value: 0.6914285714285714
221
+ name: Cosine Recall@1
222
+ - type: cosine_recall@3
223
+ value: 0.8257142857142857
224
+ name: Cosine Recall@3
225
+ - type: cosine_recall@5
226
+ value: 0.8714285714285714
227
+ name: Cosine Recall@5
228
+ - type: cosine_recall@10
229
+ value: 0.91
230
+ name: Cosine Recall@10
231
+ - type: cosine_ndcg@10
232
+ value: 0.8034440275222344
233
+ name: Cosine Ndcg@10
234
+ - type: cosine_mrr@10
235
+ value: 0.7690856009070293
236
+ name: Cosine Mrr@10
237
+ - type: cosine_map@100
238
+ value: 0.7724648546606009
239
+ name: Cosine Map@100
240
+ - task:
241
+ type: information-retrieval
242
+ name: Information Retrieval
243
+ dataset:
244
+ name: dim 128
245
+ type: dim_128
246
+ metrics:
247
+ - type: cosine_accuracy@1
248
+ value: 0.6742857142857143
249
+ name: Cosine Accuracy@1
250
+ - type: cosine_accuracy@3
251
+ value: 0.81
252
+ name: Cosine Accuracy@3
253
+ - type: cosine_accuracy@5
254
+ value: 0.8542857142857143
255
+ name: Cosine Accuracy@5
256
+ - type: cosine_accuracy@10
257
+ value: 0.9
258
+ name: Cosine Accuracy@10
259
+ - type: cosine_precision@1
260
+ value: 0.6742857142857143
261
+ name: Cosine Precision@1
262
+ - type: cosine_precision@3
263
+ value: 0.27
264
+ name: Cosine Precision@3
265
+ - type: cosine_precision@5
266
+ value: 0.17085714285714282
267
+ name: Cosine Precision@5
268
+ - type: cosine_precision@10
269
+ value: 0.09
270
+ name: Cosine Precision@10
271
+ - type: cosine_recall@1
272
+ value: 0.6742857142857143
273
+ name: Cosine Recall@1
274
+ - type: cosine_recall@3
275
+ value: 0.81
276
+ name: Cosine Recall@3
277
+ - type: cosine_recall@5
278
+ value: 0.8542857142857143
279
+ name: Cosine Recall@5
280
+ - type: cosine_recall@10
281
+ value: 0.9
282
+ name: Cosine Recall@10
283
+ - type: cosine_ndcg@10
284
+ value: 0.7881399973034273
285
+ name: Cosine Ndcg@10
286
+ - type: cosine_mrr@10
287
+ value: 0.7522210884353742
288
+ name: Cosine Mrr@10
289
+ - type: cosine_map@100
290
+ value: 0.7560032496112399
291
+ name: Cosine Map@100
292
+ - task:
293
+ type: information-retrieval
294
+ name: Information Retrieval
295
+ dataset:
296
+ name: dim 64
297
+ type: dim_64
298
+ metrics:
299
+ - type: cosine_accuracy@1
300
+ value: 0.6385714285714286
301
+ name: Cosine Accuracy@1
302
+ - type: cosine_accuracy@3
303
+ value: 0.7671428571428571
304
+ name: Cosine Accuracy@3
305
+ - type: cosine_accuracy@5
306
+ value: 0.8242857142857143
307
+ name: Cosine Accuracy@5
308
+ - type: cosine_accuracy@10
309
+ value: 0.87
310
+ name: Cosine Accuracy@10
311
+ - type: cosine_precision@1
312
+ value: 0.6385714285714286
313
+ name: Cosine Precision@1
314
+ - type: cosine_precision@3
315
+ value: 0.2557142857142857
316
+ name: Cosine Precision@3
317
+ - type: cosine_precision@5
318
+ value: 0.16485714285714284
319
+ name: Cosine Precision@5
320
+ - type: cosine_precision@10
321
+ value: 0.087
322
+ name: Cosine Precision@10
323
+ - type: cosine_recall@1
324
+ value: 0.6385714285714286
325
+ name: Cosine Recall@1
326
+ - type: cosine_recall@3
327
+ value: 0.7671428571428571
328
+ name: Cosine Recall@3
329
+ - type: cosine_recall@5
330
+ value: 0.8242857142857143
331
+ name: Cosine Recall@5
332
+ - type: cosine_recall@10
333
+ value: 0.87
334
+ name: Cosine Recall@10
335
+ - type: cosine_ndcg@10
336
+ value: 0.7528845651704559
337
+ name: Cosine Ndcg@10
338
+ - type: cosine_mrr@10
339
+ value: 0.7154948979591831
340
+ name: Cosine Mrr@10
341
+ - type: cosine_map@100
342
+ value: 0.7205565552029373
343
+ name: Cosine Map@100
344
+ ---
345
+
346
+ # BGE base Financial Matryoshka
347
+
348
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
349
+
350
+ ## Model Details
351
+
352
+ ### Model Description
353
+ - **Model Type:** Sentence Transformer
354
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
355
+ - **Maximum Sequence Length:** 512 tokens
356
+ - **Output Dimensionality:** 768 tokens
357
+ - **Similarity Function:** Cosine Similarity
358
+ <!-- - **Training Dataset:** Unknown -->
359
+ - **Language:** en
360
+ - **License:** apache-2.0
361
+
362
+ ### Model Sources
363
+
364
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
365
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
366
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
367
+
368
+ ### Full Model Architecture
369
+
370
+ ```
371
+ SentenceTransformer(
372
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
373
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
374
+ (2): Normalize()
375
+ )
376
+ ```
377
+
378
+ ## Usage
379
+
380
+ ### Direct Usage (Sentence Transformers)
381
+
382
+ First install the Sentence Transformers library:
383
+
384
+ ```bash
385
+ pip install -U sentence-transformers
386
+ ```
387
+
388
+ Then you can load this model and run inference.
389
+ ```python
390
+ from sentence_transformers import SentenceTransformer
391
+
392
+ # Download from the 🤗 Hub
393
+ model = SentenceTransformer("kperkins411/bge-base-financial-matryoshka")
394
+ # Run inference
395
+ sentences = [
396
+ 'Alternative Payments Providers: These providers, such as closed commerce ecosystems, BNPL solutions and cryptocurrency platforms, often have a primary focus of enabling payments through ecommerce and mobile channels; however, they are expanding or may expand their offerings to the physical point of sale. These companies may process payments using in-house account transfers between parties, electronic funds transfer networks like the ACH, global or local networks like Visa, or some combination of the foregoing.',
397
+ 'What are some examples of alternative payments providers and how do they compete with Visa?',
398
+ "How much did the company's currently payable U.S. taxes amount to in 2023?",
399
+ ]
400
+ embeddings = model.encode(sentences)
401
+ print(embeddings.shape)
402
+ # [3, 768]
403
+
404
+ # Get the similarity scores for the embeddings
405
+ similarities = model.similarity(embeddings, embeddings)
406
+ print(similarities.shape)
407
+ # [3, 3]
408
+ ```
409
+
410
+ <!--
411
+ ### Direct Usage (Transformers)
412
+
413
+ <details><summary>Click to see the direct usage in Transformers</summary>
414
+
415
+ </details>
416
+ -->
417
+
418
+ <!--
419
+ ### Downstream Usage (Sentence Transformers)
420
+
421
+ You can finetune this model on your own dataset.
422
+
423
+ <details><summary>Click to expand</summary>
424
+
425
+ </details>
426
+ -->
427
+
428
+ <!--
429
+ ### Out-of-Scope Use
430
+
431
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
432
+ -->
433
+
434
+ ## Evaluation
435
+
436
+ ### Metrics
437
+
438
+ #### Information Retrieval
439
+ * Dataset: `dim_768`
440
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
441
+
442
+ | Metric | Value |
443
+ |:--------------------|:-----------|
444
+ | cosine_accuracy@1 | 0.6886 |
445
+ | cosine_accuracy@3 | 0.8329 |
446
+ | cosine_accuracy@5 | 0.8743 |
447
+ | cosine_accuracy@10 | 0.9143 |
448
+ | cosine_precision@1 | 0.6886 |
449
+ | cosine_precision@3 | 0.2776 |
450
+ | cosine_precision@5 | 0.1749 |
451
+ | cosine_precision@10 | 0.0914 |
452
+ | cosine_recall@1 | 0.6886 |
453
+ | cosine_recall@3 | 0.8329 |
454
+ | cosine_recall@5 | 0.8743 |
455
+ | cosine_recall@10 | 0.9143 |
456
+ | cosine_ndcg@10 | 0.8045 |
457
+ | cosine_mrr@10 | 0.769 |
458
+ | **cosine_map@100** | **0.7722** |
459
+
460
+ #### Information Retrieval
461
+ * Dataset: `dim_512`
462
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
463
+
464
+ | Metric | Value |
465
+ |:--------------------|:----------|
466
+ | cosine_accuracy@1 | 0.6971 |
467
+ | cosine_accuracy@3 | 0.8343 |
468
+ | cosine_accuracy@5 | 0.8743 |
469
+ | cosine_accuracy@10 | 0.9071 |
470
+ | cosine_precision@1 | 0.6971 |
471
+ | cosine_precision@3 | 0.2781 |
472
+ | cosine_precision@5 | 0.1749 |
473
+ | cosine_precision@10 | 0.0907 |
474
+ | cosine_recall@1 | 0.6971 |
475
+ | cosine_recall@3 | 0.8343 |
476
+ | cosine_recall@5 | 0.8743 |
477
+ | cosine_recall@10 | 0.9071 |
478
+ | cosine_ndcg@10 | 0.8044 |
479
+ | cosine_mrr@10 | 0.7713 |
480
+ | **cosine_map@100** | **0.775** |
481
+
482
+ #### Information Retrieval
483
+ * Dataset: `dim_256`
484
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
485
+
486
+ | Metric | Value |
487
+ |:--------------------|:-----------|
488
+ | cosine_accuracy@1 | 0.6914 |
489
+ | cosine_accuracy@3 | 0.8257 |
490
+ | cosine_accuracy@5 | 0.8714 |
491
+ | cosine_accuracy@10 | 0.91 |
492
+ | cosine_precision@1 | 0.6914 |
493
+ | cosine_precision@3 | 0.2752 |
494
+ | cosine_precision@5 | 0.1743 |
495
+ | cosine_precision@10 | 0.091 |
496
+ | cosine_recall@1 | 0.6914 |
497
+ | cosine_recall@3 | 0.8257 |
498
+ | cosine_recall@5 | 0.8714 |
499
+ | cosine_recall@10 | 0.91 |
500
+ | cosine_ndcg@10 | 0.8034 |
501
+ | cosine_mrr@10 | 0.7691 |
502
+ | **cosine_map@100** | **0.7725** |
503
+
504
+ #### Information Retrieval
505
+ * Dataset: `dim_128`
506
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
507
+
508
+ | Metric | Value |
509
+ |:--------------------|:----------|
510
+ | cosine_accuracy@1 | 0.6743 |
511
+ | cosine_accuracy@3 | 0.81 |
512
+ | cosine_accuracy@5 | 0.8543 |
513
+ | cosine_accuracy@10 | 0.9 |
514
+ | cosine_precision@1 | 0.6743 |
515
+ | cosine_precision@3 | 0.27 |
516
+ | cosine_precision@5 | 0.1709 |
517
+ | cosine_precision@10 | 0.09 |
518
+ | cosine_recall@1 | 0.6743 |
519
+ | cosine_recall@3 | 0.81 |
520
+ | cosine_recall@5 | 0.8543 |
521
+ | cosine_recall@10 | 0.9 |
522
+ | cosine_ndcg@10 | 0.7881 |
523
+ | cosine_mrr@10 | 0.7522 |
524
+ | **cosine_map@100** | **0.756** |
525
+
526
+ #### Information Retrieval
527
+ * Dataset: `dim_64`
528
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
529
+
530
+ | Metric | Value |
531
+ |:--------------------|:-----------|
532
+ | cosine_accuracy@1 | 0.6386 |
533
+ | cosine_accuracy@3 | 0.7671 |
534
+ | cosine_accuracy@5 | 0.8243 |
535
+ | cosine_accuracy@10 | 0.87 |
536
+ | cosine_precision@1 | 0.6386 |
537
+ | cosine_precision@3 | 0.2557 |
538
+ | cosine_precision@5 | 0.1649 |
539
+ | cosine_precision@10 | 0.087 |
540
+ | cosine_recall@1 | 0.6386 |
541
+ | cosine_recall@3 | 0.7671 |
542
+ | cosine_recall@5 | 0.8243 |
543
+ | cosine_recall@10 | 0.87 |
544
+ | cosine_ndcg@10 | 0.7529 |
545
+ | cosine_mrr@10 | 0.7155 |
546
+ | **cosine_map@100** | **0.7206** |
547
+
548
+ <!--
549
+ ## Bias, Risks and Limitations
550
+
551
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
552
+ -->
553
+
554
+ <!--
555
+ ### Recommendations
556
+
557
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
558
+ -->
559
+
560
+ ## Training Details
561
+
562
+ ### Training Dataset
563
+
564
+ #### Unnamed Dataset
565
+
566
+
567
+ * Size: 6,300 training samples
568
+ * Columns: <code>positive</code> and <code>anchor</code>
569
+ * Approximate statistics based on the first 1000 samples:
570
+ | | positive | anchor |
571
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
572
+ | type | string | string |
573
+ | details | <ul><li>min: 6 tokens</li><li>mean: 45.51 tokens</li><li>max: 371 tokens</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 20.83 tokens</li><li>max: 45 tokens</li></ul> |
574
+ * Samples:
575
+ | positive | anchor |
576
+ |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------|
577
+ | <code>Activities related to sales before 2023 experienced adjustments due to changes in estimates, impacting the rebates and chargebacks accounts, and led to an ending balance of $4,493 million for the year 2023.</code> | <code>What adjustments were made to the rebates and chargebacks balances for previous years' sales and how did they affect the end of year balance in 2023?</code> |
578
+ | <code>We’re focused on making hosting just as popular as traveling on Airbnb. We will continue to invest in growing the size and quality of our Host community. We plan to attract more Hosts globally by expanding use cases and supporting all different types of Hosts, including those who host occasionally.</code> | <code>What is Airbnb's long-term corporate strategy regarding hosting?</code> |
579
+ | <code>Due to protectionist measures in various regions, Nike has experienced increased product costs. The company responds by monitoring trends, engaging in processes to mitigate restrictions, and advocating for trade liberalization in trade agreements.</code> | <code>What challenges related to trade protectionism has Nike faced, and what measures has the company taken in response?</code> |
580
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
581
+ ```json
582
+ {
583
+ "loss": "MultipleNegativesRankingLoss",
584
+ "matryoshka_dims": [
585
+ 768,
586
+ 512,
587
+ 256,
588
+ 128,
589
+ 64
590
+ ],
591
+ "matryoshka_weights": [
592
+ 1,
593
+ 1,
594
+ 1,
595
+ 1,
596
+ 1
597
+ ],
598
+ "n_dims_per_step": -1
599
+ }
600
+ ```
601
+
602
+ ### Training Hyperparameters
603
+ #### Non-Default Hyperparameters
604
+
605
+ - `eval_strategy`: epoch
606
+ - `per_device_train_batch_size`: 32
607
+ - `per_device_eval_batch_size`: 16
608
+ - `gradient_accumulation_steps`: 16
609
+ - `learning_rate`: 2e-05
610
+ - `num_train_epochs`: 4
611
+ - `lr_scheduler_type`: cosine
612
+ - `warmup_ratio`: 0.1
613
+ - `bf16`: True
614
+ - `tf32`: True
615
+ - `load_best_model_at_end`: True
616
+ - `optim`: adamw_torch_fused
617
+ - `batch_sampler`: no_duplicates
618
+
619
+ #### All Hyperparameters
620
+ <details><summary>Click to expand</summary>
621
+
622
+ - `overwrite_output_dir`: False
623
+ - `do_predict`: False
624
+ - `eval_strategy`: epoch
625
+ - `prediction_loss_only`: True
626
+ - `per_device_train_batch_size`: 32
627
+ - `per_device_eval_batch_size`: 16
628
+ - `per_gpu_train_batch_size`: None
629
+ - `per_gpu_eval_batch_size`: None
630
+ - `gradient_accumulation_steps`: 16
631
+ - `eval_accumulation_steps`: None
632
+ - `learning_rate`: 2e-05
633
+ - `weight_decay`: 0.0
634
+ - `adam_beta1`: 0.9
635
+ - `adam_beta2`: 0.999
636
+ - `adam_epsilon`: 1e-08
637
+ - `max_grad_norm`: 1.0
638
+ - `num_train_epochs`: 4
639
+ - `max_steps`: -1
640
+ - `lr_scheduler_type`: cosine
641
+ - `lr_scheduler_kwargs`: {}
642
+ - `warmup_ratio`: 0.1
643
+ - `warmup_steps`: 0
644
+ - `log_level`: passive
645
+ - `log_level_replica`: warning
646
+ - `log_on_each_node`: True
647
+ - `logging_nan_inf_filter`: True
648
+ - `save_safetensors`: True
649
+ - `save_on_each_node`: False
650
+ - `save_only_model`: False
651
+ - `restore_callback_states_from_checkpoint`: False
652
+ - `no_cuda`: False
653
+ - `use_cpu`: False
654
+ - `use_mps_device`: False
655
+ - `seed`: 42
656
+ - `data_seed`: None
657
+ - `jit_mode_eval`: False
658
+ - `use_ipex`: False
659
+ - `bf16`: True
660
+ - `fp16`: False
661
+ - `fp16_opt_level`: O1
662
+ - `half_precision_backend`: auto
663
+ - `bf16_full_eval`: False
664
+ - `fp16_full_eval`: False
665
+ - `tf32`: True
666
+ - `local_rank`: 0
667
+ - `ddp_backend`: None
668
+ - `tpu_num_cores`: None
669
+ - `tpu_metrics_debug`: False
670
+ - `debug`: []
671
+ - `dataloader_drop_last`: False
672
+ - `dataloader_num_workers`: 0
673
+ - `dataloader_prefetch_factor`: None
674
+ - `past_index`: -1
675
+ - `disable_tqdm`: False
676
+ - `remove_unused_columns`: True
677
+ - `label_names`: None
678
+ - `load_best_model_at_end`: True
679
+ - `ignore_data_skip`: False
680
+ - `fsdp`: []
681
+ - `fsdp_min_num_params`: 0
682
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
683
+ - `fsdp_transformer_layer_cls_to_wrap`: None
684
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
685
+ - `deepspeed`: None
686
+ - `label_smoothing_factor`: 0.0
687
+ - `optim`: adamw_torch_fused
688
+ - `optim_args`: None
689
+ - `adafactor`: False
690
+ - `group_by_length`: False
691
+ - `length_column_name`: length
692
+ - `ddp_find_unused_parameters`: None
693
+ - `ddp_bucket_cap_mb`: None
694
+ - `ddp_broadcast_buffers`: False
695
+ - `dataloader_pin_memory`: True
696
+ - `dataloader_persistent_workers`: False
697
+ - `skip_memory_metrics`: True
698
+ - `use_legacy_prediction_loop`: False
699
+ - `push_to_hub`: False
700
+ - `resume_from_checkpoint`: None
701
+ - `hub_model_id`: None
702
+ - `hub_strategy`: every_save
703
+ - `hub_private_repo`: False
704
+ - `hub_always_push`: False
705
+ - `gradient_checkpointing`: False
706
+ - `gradient_checkpointing_kwargs`: None
707
+ - `include_inputs_for_metrics`: False
708
+ - `eval_do_concat_batches`: True
709
+ - `fp16_backend`: auto
710
+ - `push_to_hub_model_id`: None
711
+ - `push_to_hub_organization`: None
712
+ - `mp_parameters`:
713
+ - `auto_find_batch_size`: False
714
+ - `full_determinism`: False
715
+ - `torchdynamo`: None
716
+ - `ray_scope`: last
717
+ - `ddp_timeout`: 1800
718
+ - `torch_compile`: False
719
+ - `torch_compile_backend`: None
720
+ - `torch_compile_mode`: None
721
+ - `dispatch_batches`: None
722
+ - `split_batches`: None
723
+ - `include_tokens_per_second`: False
724
+ - `include_num_input_tokens_seen`: False
725
+ - `neftune_noise_alpha`: None
726
+ - `optim_target_modules`: None
727
+ - `batch_eval_metrics`: False
728
+ - `batch_sampler`: no_duplicates
729
+ - `multi_dataset_batch_sampler`: proportional
730
+
731
+ </details>
732
+
733
+ ### Training Logs
734
+ | Epoch | Step | Training Loss | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_512_cosine_map@100 | dim_64_cosine_map@100 | dim_768_cosine_map@100 |
735
+ |:--------:|:------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|:----------------------:|
736
+ | 0.96 | 3 | - | 0.7116 | 0.7341 | 0.7448 | 0.6550 | 0.7455 |
737
+ | 1.92 | 6 | - | 0.7317 | 0.7520 | 0.7586 | 0.6975 | 0.7591 |
738
+ | 2.88 | 9 | - | 0.7334 | 0.7553 | 0.7631 | 0.7039 | 0.7630 |
739
+ | 3.2 | 10 | 3.3636 | - | - | - | - | - |
740
+ | **3.84** | **12** | **-** | **0.7368** | **0.759** | **0.7634** | **0.7054** | **0.7638** |
741
+ | 0.96 | 3 | - | 0.7415 | 0.7601 | 0.7672 | 0.7102 | 0.7661 |
742
+ | 1.92 | 6 | - | 0.7486 | 0.7683 | 0.7720 | 0.7205 | 0.7718 |
743
+ | 2.88 | 9 | - | 0.7556 | 0.7718 | 0.7750 | 0.7215 | 0.7717 |
744
+ | 3.2 | 10 | 1.66 | - | - | - | - | - |
745
+ | **3.84** | **12** | **-** | **0.756** | **0.7725** | **0.775** | **0.7206** | **0.7722** |
746
+
747
+ * The bold row denotes the saved checkpoint.
748
+
749
+ ### Framework Versions
750
+ - Python: 3.11.9
751
+ - Sentence Transformers: 3.0.1
752
+ - Transformers: 4.41.2
753
+ - PyTorch: 2.1.2+cu121
754
+ - Accelerate: 0.31.0
755
+ - Datasets: 2.19.1
756
+ - Tokenizers: 0.19.1
757
+
758
+ ## Citation
759
+
760
+ ### BibTeX
761
+
762
+ #### Sentence Transformers
763
+ ```bibtex
764
+ @inproceedings{reimers-2019-sentence-bert,
765
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
766
+ author = "Reimers, Nils and Gurevych, Iryna",
767
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
768
+ month = "11",
769
+ year = "2019",
770
+ publisher = "Association for Computational Linguistics",
771
+ url = "https://arxiv.org/abs/1908.10084",
772
+ }
773
+ ```
774
+
775
+ #### MatryoshkaLoss
776
+ ```bibtex
777
+ @misc{kusupati2024matryoshka,
778
+ title={Matryoshka Representation Learning},
779
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
780
+ year={2024},
781
+ eprint={2205.13147},
782
+ archivePrefix={arXiv},
783
+ primaryClass={cs.LG}
784
+ }
785
+ ```
786
+
787
+ #### MultipleNegativesRankingLoss
788
+ ```bibtex
789
+ @misc{henderson2017efficient,
790
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
791
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
792
+ year={2017},
793
+ eprint={1705.00652},
794
+ archivePrefix={arXiv},
795
+ primaryClass={cs.CL}
796
+ }
797
+ ```
798
+
799
+ <!--
800
+ ## Glossary
801
+
802
+ *Clearly define terms in order to be accessible across audiences.*
803
+ -->
804
+
805
+ <!--
806
+ ## Model Card Authors
807
+
808
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
809
+ -->
810
+
811
+ <!--
812
+ ## Model Card Contact
813
+
814
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
815
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.41.2",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.41.2",
5
+ "pytorch": "2.1.2+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:78d4d9f47bd3a53539da03a8de0f65d2b290c9dfd67510f519c25b4a67383f3b
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff