MugheesAwan11 commited on
Commit
6793140
1 Parent(s): 73eb110

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,692 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: BAAI/bge-base-en-v1.5
3
+ datasets: []
4
+ language:
5
+ - en
6
+ library_name: sentence-transformers
7
+ license: apache-2.0
8
+ metrics:
9
+ - cosine_accuracy@1
10
+ - cosine_accuracy@3
11
+ - cosine_accuracy@5
12
+ - cosine_accuracy@10
13
+ - cosine_precision@1
14
+ - cosine_precision@3
15
+ - cosine_precision@5
16
+ - cosine_precision@10
17
+ - cosine_recall@1
18
+ - cosine_recall@3
19
+ - cosine_recall@5
20
+ - cosine_recall@10
21
+ - cosine_ndcg@10
22
+ - cosine_mrr@10
23
+ - cosine_map@100
24
+ pipeline_tag: sentence-similarity
25
+ tags:
26
+ - sentence-transformers
27
+ - sentence-similarity
28
+ - feature-extraction
29
+ - generated_from_trainer
30
+ - dataset_size:1496
31
+ - loss:MatryoshkaLoss
32
+ - loss:MultipleNegativesRankingLoss
33
+ widget:
34
+ - source_sentence: We are currently involved in, and may in the future be involved
35
+ in, legal proceedings, claims, and government investigations in the ordinary course
36
+ of business. These include proceedings, claims, and investigations relating to,
37
+ among other things, regulatory matters, commercial matters, intellectual property,
38
+ competition, tax, employment, pricing, discrimination, consumer rights, personal
39
+ injury, and property rights.
40
+ sentences:
41
+ - What factors does the regulatory authority consider when ensuring data protection
42
+ in cross border transfers in Zimbabwe?
43
+ - How does Securiti enable enterprises to safely use data and the cloud while managing
44
+ security, privacy, and compliance risks?
45
+ - What types of legal issues is the company currently involved in?
46
+ - source_sentence: The Company’s minority market share in the global smartphone, personal
47
+ computer and tablet markets can make developers less inclined to develop or upgrade
48
+ software for the Company’s products and more inclined to devote their resources
49
+ to developing and upgrading software for competitors’ products with larger market
50
+ share. When developers focus their efforts on these competing platforms, the availability
51
+ and quality of applications for the Company’s devices can suffer.
52
+ sentences:
53
+ - What is the role of obtaining consent in Thailand's PDPA?
54
+ - Why might developers be less inclined to develop or upgrade software for the Company's
55
+ products?
56
+ - What caused the increase in energy generation and storage segment revenue in 2023?
57
+ - source_sentence: '** : EMEA (Europe, the Middle East and Africa) The Irish DPA implements
58
+ the GDPR into the national law by incorporating most of the provisions of the
59
+ GDPR with limited additions and deletions. It contains several provisions restricting
60
+ data subjects’ rights that they generally have under the GDPR, for example, where
61
+ restrictions are necessary for the enforcement of civil law claims. Resources*
62
+ : Irish DPA Overview Irish Cookie Guidance ### Japan #### Japan’s Act on the Protection
63
+ of Personal Information (APPI) **Effective Date (Amended APPI)** : April 01, 2022
64
+ **Region** : APAC (Asia-Pacific) Japan’s APPI regulates personal related information
65
+ and applies to any Personal Information Controller (the “PIC''''), that is a person
66
+ or entity providing personal related information for use in business in Japan.
67
+ The APPI also applies to the foreign'
68
+ sentences:
69
+ - What are the requirements for CIIOs and personal information processors in the
70
+ state cybersecurity department regarding cross-border data transfers and certifications?
71
+ - How does the Irish DPA implement the GDPR into national law?
72
+ - What is the current status of the Personal Data Protection Act in El Salvador
73
+ compared to Monaco and Venezuela?
74
+ - source_sentence: View Salesforce View Workday View GCP View Azure View Oracle View
75
+ US California CCPA View US California CPRA View European Union GDPR View Thailand’s
76
+ PDPA View China PIPL View Canada PIPEDA View Brazil's LGPD View \+ More View Privacy
77
+ View Security View Governance View Marketing View Resources Blog View Collateral
78
+ View Knowledge Center View Securiti Education View Company About Us View Partner
79
+ Program View Contact Us View News Coverage
80
+ sentences:
81
+ - What is the role of ANPD in ensuring LGPD compliance and protecting data subject
82
+ rights, including those related to health professionals?
83
+ - According to the Spanish data protection law, who is required to hire a DPO if
84
+ they possess certain information in the event of a data breach?
85
+ - What is GCP and how does it relate to privacy, security, governance, marketing,
86
+ and resources?
87
+ - source_sentence: 'vital interests of the data subject; Complying with an obligation
88
+ prescribed in PDPL, not being a contractual obligation, or complying with an order
89
+ from a competent court, the Public Prosecution, the investigation Judge, or the
90
+ Military Prosecution; or Preparing or pursuing a legal claim or defense. vs Articles:
91
+ 44 50, Recitals: 101, 112 GDPR states that personal data shall be transferred
92
+ to a third country or international organization with an adequate protection level
93
+ as determined by the EU Commission. Suppose there is no decision on an adequate
94
+ protection level. In that case, a transfer is only permitted when the data controller
95
+ or data processor provides appropriate safeguards that ensure data subject rights.
96
+ Appropriate safeguards include: BCRs with specific requirements (e.g., a legal
97
+ basis for processing, a retention period, and complaint procedures) Standard data
98
+ protection clauses adopted by the EU Commission, level of protection. If there
99
+ is no adequate level of protection, then data controllers in Turkey and abroad
100
+ shall commit, in writing, to provide an adequate level of protection abroad, as
101
+ well as agree on the fact that the transfer is permitted by the Board of KVKK.
102
+ vs Articles 44 50 Recitals 101, 112 GDPR states that personal data shall be transferred
103
+ to a third country or international organization with an adequate protection level
104
+ as determined by the EU Commission. Suppose there is no decision on an adequate
105
+ protection level. In that case, a transfer is only permitted when the data controller
106
+ or data processor provides appropriate safeguards that ensure data subject'' rights.
107
+ Appropriate safeguards include: BCRs with specific requirements (e.g., a legal
108
+ basis for processing, a retention period, and complaint procedures); standard
109
+ data protection clauses adopted by the EU Commission or by a supervisory authority;
110
+ an approved code'
111
+ sentences:
112
+ - What is the right to be informed in relation to personal data?
113
+ - In what situations can a controller process personal data to protect vital interests?
114
+ - What obligations in PDPL must data controllers or processors meet to protect personal
115
+ data transferred to a third country or international organization?
116
+ model-index:
117
+ - name: SentenceTransformer based on BAAI/bge-base-en-v1.5
118
+ results:
119
+ - task:
120
+ type: information-retrieval
121
+ name: Information Retrieval
122
+ dataset:
123
+ name: dim 768
124
+ type: dim_768
125
+ metrics:
126
+ - type: cosine_accuracy@1
127
+ value: 0.4020618556701031
128
+ name: Cosine Accuracy@1
129
+ - type: cosine_accuracy@3
130
+ value: 0.5773195876288659
131
+ name: Cosine Accuracy@3
132
+ - type: cosine_accuracy@5
133
+ value: 0.6804123711340206
134
+ name: Cosine Accuracy@5
135
+ - type: cosine_accuracy@10
136
+ value: 0.7938144329896907
137
+ name: Cosine Accuracy@10
138
+ - type: cosine_precision@1
139
+ value: 0.4020618556701031
140
+ name: Cosine Precision@1
141
+ - type: cosine_precision@3
142
+ value: 0.1924398625429553
143
+ name: Cosine Precision@3
144
+ - type: cosine_precision@5
145
+ value: 0.1360824742268041
146
+ name: Cosine Precision@5
147
+ - type: cosine_precision@10
148
+ value: 0.07938144329896907
149
+ name: Cosine Precision@10
150
+ - type: cosine_recall@1
151
+ value: 0.4020618556701031
152
+ name: Cosine Recall@1
153
+ - type: cosine_recall@3
154
+ value: 0.5773195876288659
155
+ name: Cosine Recall@3
156
+ - type: cosine_recall@5
157
+ value: 0.6804123711340206
158
+ name: Cosine Recall@5
159
+ - type: cosine_recall@10
160
+ value: 0.7938144329896907
161
+ name: Cosine Recall@10
162
+ - type: cosine_ndcg@10
163
+ value: 0.5821623921468868
164
+ name: Cosine Ndcg@10
165
+ - type: cosine_mrr@10
166
+ value: 0.5161471117656685
167
+ name: Cosine Mrr@10
168
+ - type: cosine_map@100
169
+ value: 0.5239473985229559
170
+ name: Cosine Map@100
171
+ - task:
172
+ type: information-retrieval
173
+ name: Information Retrieval
174
+ dataset:
175
+ name: dim 512
176
+ type: dim_512
177
+ metrics:
178
+ - type: cosine_accuracy@1
179
+ value: 0.41237113402061853
180
+ name: Cosine Accuracy@1
181
+ - type: cosine_accuracy@3
182
+ value: 0.5670103092783505
183
+ name: Cosine Accuracy@3
184
+ - type: cosine_accuracy@5
185
+ value: 0.6597938144329897
186
+ name: Cosine Accuracy@5
187
+ - type: cosine_accuracy@10
188
+ value: 0.7835051546391752
189
+ name: Cosine Accuracy@10
190
+ - type: cosine_precision@1
191
+ value: 0.41237113402061853
192
+ name: Cosine Precision@1
193
+ - type: cosine_precision@3
194
+ value: 0.18900343642611683
195
+ name: Cosine Precision@3
196
+ - type: cosine_precision@5
197
+ value: 0.1319587628865979
198
+ name: Cosine Precision@5
199
+ - type: cosine_precision@10
200
+ value: 0.07835051546391752
201
+ name: Cosine Precision@10
202
+ - type: cosine_recall@1
203
+ value: 0.41237113402061853
204
+ name: Cosine Recall@1
205
+ - type: cosine_recall@3
206
+ value: 0.5670103092783505
207
+ name: Cosine Recall@3
208
+ - type: cosine_recall@5
209
+ value: 0.6597938144329897
210
+ name: Cosine Recall@5
211
+ - type: cosine_recall@10
212
+ value: 0.7835051546391752
213
+ name: Cosine Recall@10
214
+ - type: cosine_ndcg@10
215
+ value: 0.5830365443881826
216
+ name: Cosine Ndcg@10
217
+ - type: cosine_mrr@10
218
+ value: 0.5208312878415973
219
+ name: Cosine Mrr@10
220
+ - type: cosine_map@100
221
+ value: 0.5295727941555394
222
+ name: Cosine Map@100
223
+ - task:
224
+ type: information-retrieval
225
+ name: Information Retrieval
226
+ dataset:
227
+ name: dim 256
228
+ type: dim_256
229
+ metrics:
230
+ - type: cosine_accuracy@1
231
+ value: 0.4020618556701031
232
+ name: Cosine Accuracy@1
233
+ - type: cosine_accuracy@3
234
+ value: 0.6185567010309279
235
+ name: Cosine Accuracy@3
236
+ - type: cosine_accuracy@5
237
+ value: 0.6494845360824743
238
+ name: Cosine Accuracy@5
239
+ - type: cosine_accuracy@10
240
+ value: 0.7628865979381443
241
+ name: Cosine Accuracy@10
242
+ - type: cosine_precision@1
243
+ value: 0.4020618556701031
244
+ name: Cosine Precision@1
245
+ - type: cosine_precision@3
246
+ value: 0.20618556701030924
247
+ name: Cosine Precision@3
248
+ - type: cosine_precision@5
249
+ value: 0.12989690721649483
250
+ name: Cosine Precision@5
251
+ - type: cosine_precision@10
252
+ value: 0.07628865979381441
253
+ name: Cosine Precision@10
254
+ - type: cosine_recall@1
255
+ value: 0.4020618556701031
256
+ name: Cosine Recall@1
257
+ - type: cosine_recall@3
258
+ value: 0.6185567010309279
259
+ name: Cosine Recall@3
260
+ - type: cosine_recall@5
261
+ value: 0.6494845360824743
262
+ name: Cosine Recall@5
263
+ - type: cosine_recall@10
264
+ value: 0.7628865979381443
265
+ name: Cosine Recall@10
266
+ - type: cosine_ndcg@10
267
+ value: 0.576352896876016
268
+ name: Cosine Ndcg@10
269
+ - type: cosine_mrr@10
270
+ value: 0.5177957781050565
271
+ name: Cosine Mrr@10
272
+ - type: cosine_map@100
273
+ value: 0.527827441661229
274
+ name: Cosine Map@100
275
+ ---
276
+
277
+ # SentenceTransformer based on BAAI/bge-base-en-v1.5
278
+
279
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
280
+
281
+ ## Model Details
282
+
283
+ ### Model Description
284
+ - **Model Type:** Sentence Transformer
285
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
286
+ - **Maximum Sequence Length:** 512 tokens
287
+ - **Output Dimensionality:** 768 tokens
288
+ - **Similarity Function:** Cosine Similarity
289
+ <!-- - **Training Dataset:** Unknown -->
290
+ - **Language:** en
291
+ - **License:** apache-2.0
292
+
293
+ ### Model Sources
294
+
295
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
296
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
297
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
298
+
299
+ ### Full Model Architecture
300
+
301
+ ```
302
+ SentenceTransformer(
303
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
304
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
305
+ (2): Normalize()
306
+ )
307
+ ```
308
+
309
+ ## Usage
310
+
311
+ ### Direct Usage (Sentence Transformers)
312
+
313
+ First install the Sentence Transformers library:
314
+
315
+ ```bash
316
+ pip install -U sentence-transformers
317
+ ```
318
+
319
+ Then you can load this model and run inference.
320
+ ```python
321
+ from sentence_transformers import SentenceTransformer
322
+
323
+ # Download from the 🤗 Hub
324
+ model = SentenceTransformer("MugheesAwan11/bge-base-securiti-dataset-1-v22")
325
+ # Run inference
326
+ sentences = [
327
+ "vital interests of the data subject; Complying with an obligation prescribed in PDPL, not being a contractual obligation, or complying with an order from a competent court, the Public Prosecution, the investigation Judge, or the Military Prosecution; or Preparing or pursuing a legal claim or defense. vs Articles: 44 50, Recitals: 101, 112 GDPR states that personal data shall be transferred to a third country or international organization with an adequate protection level as determined by the EU Commission. Suppose there is no decision on an adequate protection level. In that case, a transfer is only permitted when the data controller or data processor provides appropriate safeguards that ensure data subject rights. Appropriate safeguards include: BCRs with specific requirements (e.g., a legal basis for processing, a retention period, and complaint procedures) Standard data protection clauses adopted by the EU Commission, level of protection. If there is no adequate level of protection, then data controllers in Turkey and abroad shall commit, in writing, to provide an adequate level of protection abroad, as well as agree on the fact that the transfer is permitted by the Board of KVKK. vs Articles 44 50 Recitals 101, 112 GDPR states that personal data shall be transferred to a third country or international organization with an adequate protection level as determined by the EU Commission. Suppose there is no decision on an adequate protection level. In that case, a transfer is only permitted when the data controller or data processor provides appropriate safeguards that ensure data subject' rights. Appropriate safeguards include: BCRs with specific requirements (e.g., a legal basis for processing, a retention period, and complaint procedures); standard data protection clauses adopted by the EU Commission or by a supervisory authority; an approved code",
328
+ 'What obligations in PDPL must data controllers or processors meet to protect personal data transferred to a third country or international organization?',
329
+ 'In what situations can a controller process personal data to protect vital interests?',
330
+ ]
331
+ embeddings = model.encode(sentences)
332
+ print(embeddings.shape)
333
+ # [3, 768]
334
+
335
+ # Get the similarity scores for the embeddings
336
+ similarities = model.similarity(embeddings, embeddings)
337
+ print(similarities.shape)
338
+ # [3, 3]
339
+ ```
340
+
341
+ <!--
342
+ ### Direct Usage (Transformers)
343
+
344
+ <details><summary>Click to see the direct usage in Transformers</summary>
345
+
346
+ </details>
347
+ -->
348
+
349
+ <!--
350
+ ### Downstream Usage (Sentence Transformers)
351
+
352
+ You can finetune this model on your own dataset.
353
+
354
+ <details><summary>Click to expand</summary>
355
+
356
+ </details>
357
+ -->
358
+
359
+ <!--
360
+ ### Out-of-Scope Use
361
+
362
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
363
+ -->
364
+
365
+ ## Evaluation
366
+
367
+ ### Metrics
368
+
369
+ #### Information Retrieval
370
+ * Dataset: `dim_768`
371
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
372
+
373
+ | Metric | Value |
374
+ |:--------------------|:-----------|
375
+ | cosine_accuracy@1 | 0.4021 |
376
+ | cosine_accuracy@3 | 0.5773 |
377
+ | cosine_accuracy@5 | 0.6804 |
378
+ | cosine_accuracy@10 | 0.7938 |
379
+ | cosine_precision@1 | 0.4021 |
380
+ | cosine_precision@3 | 0.1924 |
381
+ | cosine_precision@5 | 0.1361 |
382
+ | cosine_precision@10 | 0.0794 |
383
+ | cosine_recall@1 | 0.4021 |
384
+ | cosine_recall@3 | 0.5773 |
385
+ | cosine_recall@5 | 0.6804 |
386
+ | cosine_recall@10 | 0.7938 |
387
+ | cosine_ndcg@10 | 0.5822 |
388
+ | cosine_mrr@10 | 0.5161 |
389
+ | **cosine_map@100** | **0.5239** |
390
+
391
+ #### Information Retrieval
392
+ * Dataset: `dim_512`
393
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
394
+
395
+ | Metric | Value |
396
+ |:--------------------|:-----------|
397
+ | cosine_accuracy@1 | 0.4124 |
398
+ | cosine_accuracy@3 | 0.567 |
399
+ | cosine_accuracy@5 | 0.6598 |
400
+ | cosine_accuracy@10 | 0.7835 |
401
+ | cosine_precision@1 | 0.4124 |
402
+ | cosine_precision@3 | 0.189 |
403
+ | cosine_precision@5 | 0.132 |
404
+ | cosine_precision@10 | 0.0784 |
405
+ | cosine_recall@1 | 0.4124 |
406
+ | cosine_recall@3 | 0.567 |
407
+ | cosine_recall@5 | 0.6598 |
408
+ | cosine_recall@10 | 0.7835 |
409
+ | cosine_ndcg@10 | 0.583 |
410
+ | cosine_mrr@10 | 0.5208 |
411
+ | **cosine_map@100** | **0.5296** |
412
+
413
+ #### Information Retrieval
414
+ * Dataset: `dim_256`
415
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
416
+
417
+ | Metric | Value |
418
+ |:--------------------|:-----------|
419
+ | cosine_accuracy@1 | 0.4021 |
420
+ | cosine_accuracy@3 | 0.6186 |
421
+ | cosine_accuracy@5 | 0.6495 |
422
+ | cosine_accuracy@10 | 0.7629 |
423
+ | cosine_precision@1 | 0.4021 |
424
+ | cosine_precision@3 | 0.2062 |
425
+ | cosine_precision@5 | 0.1299 |
426
+ | cosine_precision@10 | 0.0763 |
427
+ | cosine_recall@1 | 0.4021 |
428
+ | cosine_recall@3 | 0.6186 |
429
+ | cosine_recall@5 | 0.6495 |
430
+ | cosine_recall@10 | 0.7629 |
431
+ | cosine_ndcg@10 | 0.5764 |
432
+ | cosine_mrr@10 | 0.5178 |
433
+ | **cosine_map@100** | **0.5278** |
434
+
435
+ <!--
436
+ ## Bias, Risks and Limitations
437
+
438
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
439
+ -->
440
+
441
+ <!--
442
+ ### Recommendations
443
+
444
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
445
+ -->
446
+
447
+ ## Training Details
448
+
449
+ ### Training Dataset
450
+
451
+ #### Unnamed Dataset
452
+
453
+
454
+ * Size: 1,496 training samples
455
+ * Columns: <code>positive</code> and <code>anchor</code>
456
+ * Approximate statistics based on the first 1000 samples:
457
+ | | positive | anchor |
458
+ |:--------|:-------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
459
+ | type | string | string |
460
+ | details | <ul><li>min: 67 tokens</li><li>mean: 216.99 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 21.6 tokens</li><li>max: 102 tokens</li></ul> |
461
+ * Samples:
462
+ | positive | anchor |
463
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------|
464
+ | <code>Leader in Data Privacy View Events Spotlight Talks Education Contact Us Schedule a Demo Products By Use Cases By Roles Data Command Center View Learn more Asset and Data Discovery Discover dark and native data assets Learn more Data Access Intelligence & Governance Identify which users have access to sensitive data and prevent unauthorized access Learn more Data Privacy Automation PrivacyCenter.Cloud | Data Mapping | DSR Automation | Assessment Automation | Vendor Assessment | Breach Management | Privacy Notice Learn more Sensitive Data Intelligence Discover & Classify Structured and Unstructured Data | People Data Graph Learn more Data Flow Intelligence & Governance Prevent sensitive data sprawl through real-time streaming platforms Learn more Data Consent Automation First Party Consent | Third Party & Cookie</code> | <code>What is the purpose of the Data Command Center?</code> |
465
+ | <code>data subject must be notified of any such extension within one month of receiving the request, along with the reasons for the delay and the possibility of complaining to the supervisory authority. The right to restrict processing applies when the data subject contests data accuracy, the processing is unlawful, and the data subject opposes erasure and requests restriction. The controller must inform data subjects before any such restriction is lifted. Under GDPR, the data subject also has the right to obtain from the controller the rectification of inaccurate personal data and to have incomplete personal data completed. Article: 22 Under PDPL, if a decision is based solely on automated processing of personal data intended to assess the data subject regarding his/her performance at work, financial standing, credit-worthiness, reliability, or conduct, then the data subject has the right to request processing in a manner that is not solely automated. This right shall not apply where the decision is taken in the course of entering into</code> | <code>What is the requirement for notifying the data subject of any extension under GDPR and PDPL?</code> |
466
+ | <code>Automation PrivacyCenter.Cloud | Data Mapping | DSR Automation | Assessment Automation | Vendor Assessment | Breach Management | Privacy Notice Learn more Sensitive Data Intelligence Discover & Classify Structured and Unstructured Data | People Data Graph Learn more Data Flow Intelligence & Governance Prevent sensitive data sprawl through real-time streaming platforms Learn more Data Consent Automation First Party Consent | Third Party & Cookie Consent Learn more Data Security Posture Management Secure sensitive data in hybrid multicloud and SaaS environments Learn more Data Breach Impact Analysis & Response Analyze impact of a data breach and coordinate response per global regulatory obligations Learn more Data Catalog Automatically catalog datasets and enable users to find, understand, trust and access data Learn more Data Lineage Track changes and transformations of, PrivacyCenter.Cloud | Data Mapping | DSR Automation | Assessment Automation | Vendor Assessment | Breach Management | Privacy Notice Learn more Sensitive Data Intelligence Discover & Classify Structured and Unstructured Data | People Data Graph Learn more Data Flow Intelligence & Governance Prevent sensitive data sprawl through real-time streaming platforms Learn more Data Consent Automation First Party Consent | Third Party & Cookie Consent Learn more Data Security Posture Management Secure sensitive data in hybrid multicloud and SaaS environments Learn more Data Breach Impact Analysis & Response Analyze impact of a data breach and coordinate response per global regulatory obligations Learn more Data Catalog Automatically catalog datasets and enable users to find, understand, trust and access data Learn more Data Lineage Track changes and transformations of data throughout its</code> | <code>What is the purpose of Third Party & Cookie Consent in data automation and security?</code> |
467
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
468
+ ```json
469
+ {
470
+ "loss": "MultipleNegativesRankingLoss",
471
+ "matryoshka_dims": [
472
+ 768,
473
+ 512,
474
+ 256
475
+ ],
476
+ "matryoshka_weights": [
477
+ 1,
478
+ 1,
479
+ 1
480
+ ],
481
+ "n_dims_per_step": -1
482
+ }
483
+ ```
484
+
485
+ ### Training Hyperparameters
486
+ #### Non-Default Hyperparameters
487
+
488
+ - `eval_strategy`: epoch
489
+ - `per_device_train_batch_size`: 32
490
+ - `per_device_eval_batch_size`: 16
491
+ - `learning_rate`: 2e-05
492
+ - `num_train_epochs`: 1
493
+ - `lr_scheduler_type`: cosine
494
+ - `warmup_ratio`: 0.1
495
+ - `bf16`: True
496
+ - `tf32`: True
497
+ - `load_best_model_at_end`: True
498
+ - `optim`: adamw_torch_fused
499
+ - `batch_sampler`: no_duplicates
500
+
501
+ #### All Hyperparameters
502
+ <details><summary>Click to expand</summary>
503
+
504
+ - `overwrite_output_dir`: False
505
+ - `do_predict`: False
506
+ - `eval_strategy`: epoch
507
+ - `prediction_loss_only`: True
508
+ - `per_device_train_batch_size`: 32
509
+ - `per_device_eval_batch_size`: 16
510
+ - `per_gpu_train_batch_size`: None
511
+ - `per_gpu_eval_batch_size`: None
512
+ - `gradient_accumulation_steps`: 1
513
+ - `eval_accumulation_steps`: None
514
+ - `learning_rate`: 2e-05
515
+ - `weight_decay`: 0.0
516
+ - `adam_beta1`: 0.9
517
+ - `adam_beta2`: 0.999
518
+ - `adam_epsilon`: 1e-08
519
+ - `max_grad_norm`: 1.0
520
+ - `num_train_epochs`: 1
521
+ - `max_steps`: -1
522
+ - `lr_scheduler_type`: cosine
523
+ - `lr_scheduler_kwargs`: {}
524
+ - `warmup_ratio`: 0.1
525
+ - `warmup_steps`: 0
526
+ - `log_level`: passive
527
+ - `log_level_replica`: warning
528
+ - `log_on_each_node`: True
529
+ - `logging_nan_inf_filter`: True
530
+ - `save_safetensors`: True
531
+ - `save_on_each_node`: False
532
+ - `save_only_model`: False
533
+ - `restore_callback_states_from_checkpoint`: False
534
+ - `no_cuda`: False
535
+ - `use_cpu`: False
536
+ - `use_mps_device`: False
537
+ - `seed`: 42
538
+ - `data_seed`: None
539
+ - `jit_mode_eval`: False
540
+ - `use_ipex`: False
541
+ - `bf16`: True
542
+ - `fp16`: False
543
+ - `fp16_opt_level`: O1
544
+ - `half_precision_backend`: auto
545
+ - `bf16_full_eval`: False
546
+ - `fp16_full_eval`: False
547
+ - `tf32`: True
548
+ - `local_rank`: 0
549
+ - `ddp_backend`: None
550
+ - `tpu_num_cores`: None
551
+ - `tpu_metrics_debug`: False
552
+ - `debug`: []
553
+ - `dataloader_drop_last`: False
554
+ - `dataloader_num_workers`: 0
555
+ - `dataloader_prefetch_factor`: None
556
+ - `past_index`: -1
557
+ - `disable_tqdm`: False
558
+ - `remove_unused_columns`: True
559
+ - `label_names`: None
560
+ - `load_best_model_at_end`: True
561
+ - `ignore_data_skip`: False
562
+ - `fsdp`: []
563
+ - `fsdp_min_num_params`: 0
564
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
565
+ - `fsdp_transformer_layer_cls_to_wrap`: None
566
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
567
+ - `deepspeed`: None
568
+ - `label_smoothing_factor`: 0.0
569
+ - `optim`: adamw_torch_fused
570
+ - `optim_args`: None
571
+ - `adafactor`: False
572
+ - `group_by_length`: False
573
+ - `length_column_name`: length
574
+ - `ddp_find_unused_parameters`: None
575
+ - `ddp_bucket_cap_mb`: None
576
+ - `ddp_broadcast_buffers`: False
577
+ - `dataloader_pin_memory`: True
578
+ - `dataloader_persistent_workers`: False
579
+ - `skip_memory_metrics`: True
580
+ - `use_legacy_prediction_loop`: False
581
+ - `push_to_hub`: False
582
+ - `resume_from_checkpoint`: None
583
+ - `hub_model_id`: None
584
+ - `hub_strategy`: every_save
585
+ - `hub_private_repo`: False
586
+ - `hub_always_push`: False
587
+ - `gradient_checkpointing`: False
588
+ - `gradient_checkpointing_kwargs`: None
589
+ - `include_inputs_for_metrics`: False
590
+ - `eval_do_concat_batches`: True
591
+ - `fp16_backend`: auto
592
+ - `push_to_hub_model_id`: None
593
+ - `push_to_hub_organization`: None
594
+ - `mp_parameters`:
595
+ - `auto_find_batch_size`: False
596
+ - `full_determinism`: False
597
+ - `torchdynamo`: None
598
+ - `ray_scope`: last
599
+ - `ddp_timeout`: 1800
600
+ - `torch_compile`: False
601
+ - `torch_compile_backend`: None
602
+ - `torch_compile_mode`: None
603
+ - `dispatch_batches`: None
604
+ - `split_batches`: None
605
+ - `include_tokens_per_second`: False
606
+ - `include_num_input_tokens_seen`: False
607
+ - `neftune_noise_alpha`: None
608
+ - `optim_target_modules`: None
609
+ - `batch_eval_metrics`: False
610
+ - `batch_sampler`: no_duplicates
611
+ - `multi_dataset_batch_sampler`: proportional
612
+
613
+ </details>
614
+
615
+ ### Training Logs
616
+ | Epoch | Step | Training Loss | dim_256_cosine_map@100 | dim_512_cosine_map@100 | dim_768_cosine_map@100 |
617
+ |:-------:|:------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|
618
+ | 0.2128 | 10 | 3.8486 | - | - | - |
619
+ | 0.4255 | 20 | 2.3611 | - | - | - |
620
+ | 0.6383 | 30 | 2.3209 | - | - | - |
621
+ | 0.8511 | 40 | 1.3248 | - | - | - |
622
+ | **1.0** | **47** | **-** | **0.5278** | **0.5296** | **0.5239** |
623
+
624
+ * The bold row denotes the saved checkpoint.
625
+
626
+ ### Framework Versions
627
+ - Python: 3.10.14
628
+ - Sentence Transformers: 3.0.1
629
+ - Transformers: 4.41.2
630
+ - PyTorch: 2.1.2+cu121
631
+ - Accelerate: 0.31.0
632
+ - Datasets: 2.19.1
633
+ - Tokenizers: 0.19.1
634
+
635
+ ## Citation
636
+
637
+ ### BibTeX
638
+
639
+ #### Sentence Transformers
640
+ ```bibtex
641
+ @inproceedings{reimers-2019-sentence-bert,
642
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
643
+ author = "Reimers, Nils and Gurevych, Iryna",
644
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
645
+ month = "11",
646
+ year = "2019",
647
+ publisher = "Association for Computational Linguistics",
648
+ url = "https://arxiv.org/abs/1908.10084",
649
+ }
650
+ ```
651
+
652
+ #### MatryoshkaLoss
653
+ ```bibtex
654
+ @misc{kusupati2024matryoshka,
655
+ title={Matryoshka Representation Learning},
656
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
657
+ year={2024},
658
+ eprint={2205.13147},
659
+ archivePrefix={arXiv},
660
+ primaryClass={cs.LG}
661
+ }
662
+ ```
663
+
664
+ #### MultipleNegativesRankingLoss
665
+ ```bibtex
666
+ @misc{henderson2017efficient,
667
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
668
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
669
+ year={2017},
670
+ eprint={1705.00652},
671
+ archivePrefix={arXiv},
672
+ primaryClass={cs.CL}
673
+ }
674
+ ```
675
+
676
+ <!--
677
+ ## Glossary
678
+
679
+ *Clearly define terms in order to be accessible across audiences.*
680
+ -->
681
+
682
+ <!--
683
+ ## Model Card Authors
684
+
685
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
686
+ -->
687
+
688
+ <!--
689
+ ## Model Card Contact
690
+
691
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
692
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.41.2",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.41.2",
5
+ "pytorch": "2.1.2+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:91af74728ff237a7fe5695e60dce8311a52ad803d2a97f310b2029833d3515e0
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff