Behrni commited on
Commit
2819e39
1 Parent(s): 1577e8d

End of training

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,478 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: avsolatorio/GIST-small-Embedding-v0
3
+ library_name: sentence-transformers
4
+ metrics:
5
+ - cosine_accuracy
6
+ - dot_accuracy
7
+ - manhattan_accuracy
8
+ - euclidean_accuracy
9
+ - max_accuracy
10
+ pipeline_tag: sentence-similarity
11
+ tags:
12
+ - sentence-transformers
13
+ - sentence-similarity
14
+ - feature-extraction
15
+ - generated_from_trainer
16
+ - dataset_size:3414
17
+ - loss:MultipleNegativesRankingLoss
18
+ widget:
19
+ - source_sentence: For all components of the structural-technical infrastructure,
20
+ at least the intervals and requirements for inspection and maintenance recommended
21
+ by the manufacturer or set by standards shall be complied with. Inspections and
22
+ maintenance work must be recorded. Fire barriers must be checked to see if they
23
+ are intact. The results must be documented.
24
+ sentences:
25
+ - Security perimeters shall be defined and used to protect areas that contain information
26
+ and other associated assets.
27
+ - The use of resources shall be monitored and adjusted in line with current and
28
+ expected capacity requirements.
29
+ - A.7.1
30
+ - source_sentence: All employees and external users must be instructed and sensitized
31
+ in the safe handling of IT, ICS and IoT components, as far as this is relevant
32
+ for their work contexts. To this end, binding, understandable and up-to-date guidelines
33
+ for the use of the respective components must be available. If IT, ICS or IoT
34
+ systems or services are used in a way that contradicts the interests of the institution,
35
+ this must be communicated.
36
+ sentences:
37
+ - Security perimeters shall be defined and used to protect areas that contain information
38
+ and other associated assets.
39
+ - Records shall be protected from loss, destruction, falsification, unauthorized
40
+ access and unauthorized release.
41
+ - A.5.33
42
+ - source_sentence: Data Lost Prevention (DLP) systems should be used at network level.
43
+ sentences:
44
+ - A.5.15
45
+ - Information security shall be integrated into project management.
46
+ - Rules to control physical and logical access to information and other associated
47
+ assets shall be established and implemented based on busi-ness and information
48
+ security requirements.
49
+ - source_sentence: Ensure that audit records contain information that establishes
50
+ the following:a. What type of event occurred;b. When the event occurred;c. Where
51
+ the event occurred;d. Source of the event;e. Outcome of the event; andf. Identity
52
+ of any individuals, subjects, or objects/entities associated with the event.
53
+ sentences:
54
+ - 'Für alle Arten von Übertragungseinrichtungen innerhalb der
55
+
56
+ Organisation und zwischen der Organisation und anderen Parteien
57
+
58
+ müssen Regeln, Verfahren oder Vereinbarungen zur
59
+
60
+ Informationsübermittlung vorhanden sein.'
61
+ - A.8.15
62
+ - 'Protokolle, die Aktivitäten, Ausnahmen, Fehler und andere relevante
63
+
64
+ Ereignisse aufzeichnen, müssen erstellt, gespeichert, geschützt und
65
+
66
+ analysiert werden.'
67
+ - source_sentence: A security incident must inform all affected internal and external
68
+ bodies in a timely manner. It is necessary to check whether the Data Protection
69
+ Officer, the Works and Staff Council and employees from the Legal Department need
70
+ to be involved. Similarly, the reporting requirements for authorities and regulated
71
+ sectors must be taken into account. It is also necessary to ensure that relevant
72
+ bodies are informed of the necessary measures.
73
+ sentences:
74
+ - Rules to control physical and logical access to information and other associated
75
+ assets shall be established and implemented based on busi-ness and information
76
+ security requirements.
77
+ - The organization shall plan and prepare for managing information secu-rity incidents
78
+ by defining, establishing and communicating information security incident management
79
+ processes, roles and responsibilities.
80
+ - A.5.24
81
+ model-index:
82
+ - name: SentenceTransformer based on avsolatorio/GIST-small-Embedding-v0
83
+ results:
84
+ - task:
85
+ type: triplet
86
+ name: Triplet
87
+ dataset:
88
+ name: GIST small Embedding v0 4 batch 10 epoch all data en unique split robustness
89
+ 42 eval
90
+ type: GIST-small-Embedding-v0-4_batch_10_epoch_all_data_en_unique_split_robustness_42_eval
91
+ metrics:
92
+ - type: cosine_accuracy
93
+ value: 0.8762006403415155
94
+ name: Cosine Accuracy
95
+ - type: dot_accuracy
96
+ value: 0.09498399146211313
97
+ name: Dot Accuracy
98
+ - type: manhattan_accuracy
99
+ value: 0.8697972251867663
100
+ name: Manhattan Accuracy
101
+ - type: euclidean_accuracy
102
+ value: 0.8762006403415155
103
+ name: Euclidean Accuracy
104
+ - type: max_accuracy
105
+ value: 0.8762006403415155
106
+ name: Max Accuracy
107
+ ---
108
+
109
+ # SentenceTransformer based on avsolatorio/GIST-small-Embedding-v0
110
+
111
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [avsolatorio/GIST-small-Embedding-v0](https://huggingface.co/avsolatorio/GIST-small-Embedding-v0). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
112
+
113
+ ## Model Details
114
+
115
+ ### Model Description
116
+ - **Model Type:** Sentence Transformer
117
+ - **Base model:** [avsolatorio/GIST-small-Embedding-v0](https://huggingface.co/avsolatorio/GIST-small-Embedding-v0) <!-- at revision d6c4190f9e01b9994dc7cac99cf2f2b85cfb57bc -->
118
+ - **Maximum Sequence Length:** 512 tokens
119
+ - **Output Dimensionality:** 384 tokens
120
+ - **Similarity Function:** Cosine Similarity
121
+ <!-- - **Training Dataset:** Unknown -->
122
+ <!-- - **Language:** Unknown -->
123
+ <!-- - **License:** Unknown -->
124
+
125
+ ### Model Sources
126
+
127
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
128
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
129
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
130
+
131
+ ### Full Model Architecture
132
+
133
+ ```
134
+ SentenceTransformer(
135
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
136
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
137
+ (2): Normalize()
138
+ )
139
+ ```
140
+
141
+ ## Usage
142
+
143
+ ### Direct Usage (Sentence Transformers)
144
+
145
+ First install the Sentence Transformers library:
146
+
147
+ ```bash
148
+ pip install -U sentence-transformers
149
+ ```
150
+
151
+ Then you can load this model and run inference.
152
+ ```python
153
+ from sentence_transformers import SentenceTransformer
154
+
155
+ # Download from the 🤗 Hub
156
+ model = SentenceTransformer("GIST-small-Embedding-v0-4_batch_10_epoch_all_data_en_unique_split")
157
+ # Run inference
158
+ sentences = [
159
+ 'A security incident must inform all affected internal and external bodies in a timely manner. It is necessary to check whether the Data Protection Officer, the Works and Staff Council and employees from the Legal Department need to be involved. Similarly, the reporting requirements for authorities and regulated sectors must be taken into account. It is also necessary to ensure that relevant bodies are informed of the necessary measures.',
160
+ 'The organization shall plan and prepare for managing information secu-rity incidents by defining, establishing and communicating information security incident management processes, roles and responsibilities.',
161
+ 'A.5.24',
162
+ ]
163
+ embeddings = model.encode(sentences)
164
+ print(embeddings.shape)
165
+ # [3, 384]
166
+
167
+ # Get the similarity scores for the embeddings
168
+ similarities = model.similarity(embeddings, embeddings)
169
+ print(similarities.shape)
170
+ # [3, 3]
171
+ ```
172
+
173
+ <!--
174
+ ### Direct Usage (Transformers)
175
+
176
+ <details><summary>Click to see the direct usage in Transformers</summary>
177
+
178
+ </details>
179
+ -->
180
+
181
+ <!--
182
+ ### Downstream Usage (Sentence Transformers)
183
+
184
+ You can finetune this model on your own dataset.
185
+
186
+ <details><summary>Click to expand</summary>
187
+
188
+ </details>
189
+ -->
190
+
191
+ <!--
192
+ ### Out-of-Scope Use
193
+
194
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
195
+ -->
196
+
197
+ ## Evaluation
198
+
199
+ ### Metrics
200
+
201
+ #### Triplet
202
+ * Dataset: `GIST-small-Embedding-v0-4_batch_10_epoch_all_data_en_unique_split_robustness_42_eval`
203
+ * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
204
+
205
+ | Metric | Value |
206
+ |:--------------------|:-----------|
207
+ | **cosine_accuracy** | **0.8762** |
208
+ | dot_accuracy | 0.095 |
209
+ | manhattan_accuracy | 0.8698 |
210
+ | euclidean_accuracy | 0.8762 |
211
+ | max_accuracy | 0.8762 |
212
+
213
+ <!--
214
+ ## Bias, Risks and Limitations
215
+
216
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
217
+ -->
218
+
219
+ <!--
220
+ ### Recommendations
221
+
222
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
223
+ -->
224
+
225
+ ## Training Details
226
+
227
+ ### Training Dataset
228
+
229
+ #### Unnamed Dataset
230
+
231
+
232
+ * Size: 3,414 training samples
233
+ * Columns: <code>anchor</code>, <code>positive</code>, <code>ISO_ID</code>, and <code>negative</code>
234
+ * Approximate statistics based on the first 1000 samples:
235
+ | | anchor | positive | ISO_ID | negative |
236
+ |:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
237
+ | type | string | string | string | string |
238
+ | details | <ul><li>min: 3 tokens</li><li>mean: 79.84 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 23.34 tokens</li><li>max: 192 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 6.99 tokens</li><li>max: 7 tokens</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 22.91 tokens</li><li>max: 154 tokens</li></ul> |
239
+ * Samples:
240
+ | anchor | positive | ISO_ID | negative |
241
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
242
+ | <code>System components in the area of responsibility of the Cloud Service Provider for the provision of the cloud service are automatically checked for known vulnerabilities at least once a month in accordance with the policies for handling vulnerabilities (cf. OPS-18), the severity is assessed in accordance with defined criteria and measures for timely remediation or mitigation are initiated within defined time windows.</code> | <code>Information about technical vulnerabilities of information systems in use shall be obtained, the organization’s exposure to such vulnerabilities shall be evaluated and appropriate measures shall be taken.</code> | <code>A.8.8</code> | <code>Information processing facilities shall be implemented with redundancy sufficient to meet availability requirements.</code> |
243
+ | <code>System components in the area of responsibility of the Cloud Service Provider for the provision of the cloud service are automatically checked for known vulnerabilities at least once a month in accordance with the policies for handling vulnerabilities (cf. OPS-18), the severity is assessed in accordance with defined criteria and measures for timely remediation or mitigation are initiated within defined time windows.</code> | <code>Changes to information processing facilities and information systems shall be subject to change management procedures.</code> | <code>A.8.32</code> | <code>Rules for the effective use of cryptography, including cryptographic key management, shall be defined and implemented.</code> |
244
+ | <code>The Cloud Service Provider retains the generated log data and keeps these in an appropriate, unchangeable and aggregated form, regardless of the source of such data, so that a central, authorised evaluation of the data is possible. Log data is deleted if it is no longer required for the purpose for which they were collected. <br><br>Between logging servers and the assets to be logged, authentication takes place to protect the integrity and authenticity of the information transmitted and stored. The transfer takes place using state-of-the-art encryption or a dedicated administration network (out-of-band management).</code> | <code>Logs that record activities, exceptions, faults and other relevant events shall be produced, stored, protected and analysed.</code> | <code>A.8.15</code> | <code>Configurations, including security configurations, of hardware, software, services and networks shall be established, documented, implemented, monitored and reviewed.</code> |
245
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
246
+ ```json
247
+ {
248
+ "scale": 20.0,
249
+ "similarity_fct": "cos_sim"
250
+ }
251
+ ```
252
+
253
+ ### Evaluation Dataset
254
+
255
+ #### Unnamed Dataset
256
+
257
+
258
+ * Size: 937 evaluation samples
259
+ * Columns: <code>anchor</code>, <code>positive</code>, <code>ISO_ID</code>, and <code>negative</code>
260
+ * Approximate statistics based on the first 937 samples:
261
+ | | anchor | positive | ISO_ID | negative |
262
+ |:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
263
+ | type | string | string | string | string |
264
+ | details | <ul><li>min: 12 tokens</li><li>mean: 76.9 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 41.55 tokens</li><li>max: 495 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 6.91 tokens</li><li>max: 7 tokens</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 40.68 tokens</li><li>max: 495 tokens</li></ul> |
265
+ * Samples:
266
+ | anchor | positive | ISO_ID | negative |
267
+ |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
268
+ | <code>The Cloud Service Provider's internal and external employees are required by the employment terms and conditions to comply with applicable policies and instructions relating to information security.<br><br>The information security policy, and the policies and instructions based on it, are to be acknowledged by the internal and external personnel in a documented form before access is granted to any cloud customer data or system components under the responsibility of the Cloud Service Provider used to provide the cloud service in the production environment.</code> | <code>The employment contractual agreements shall state the personnel’s and the organization’s responsibilities for information security.</code> | <code>A.6.2</code> | <code>The organization shall establish and implement procedures for the identification, collection, acquisition and preservation of evidence related to information security events.</code> |
269
+ | <code>The Cloud Service Provider has established procedures for inventorying assets.<br><br>The inventory is performed automatically and/or by the people or teams responsible for the assets to ensure complete, accurate, valid and consistent inventory throughout the asset lifecycle.<br><br>Assets are recorded with the information needed to apply the Risk Management Procedure (Cf. OIS-07), including the measures taken to manage these risks throughout the asset lifecycle. Changes to this information are logged.</code> | <code>An inventory of information and other associated assets, including owners, shall be developed and maintained.</code> | <code>A.5.9</code> | <code>Access rights to information and other associated assets shall be provisioned, reviewed, modified and removed in accordance with the organization’s topic-specific policy on and rules for access control.</code> |
270
+ | <code>The Cloud Service Provider provides a training program for regular, target group-oriented security training and awareness for internal and external employees on standards and methods of secure software development and provision as well as on how to use the tools used for this purpose. The program is regularly reviewed and updated with regard to the applicable policies and instructions, the assigned roles and responsibilities and the tools used.</code> | <code>The organization shall:<br>a) determine the necessary competence of person(s) doing work under its control that affects its information security performance;<br>b) ensure that these persons are competent on the basis of appropriate education, training, or experience;<br>c) where applicable, take actions to acquire the necessary competence, and evaluate the effectiveness of the actions taken; and<br>d) retain appropriate documented information as evidence of competence.<br>NOTE Applicable actions can include, for example: the provision of training to, the mentoring of, or the re- assignment of current employees; or the hiring or contracting of competent persons.</code> | <code>7.2</code> | <code>Knowledge gained from information security incidents shall be used to strengthen and improve the information security controls.</code> |
271
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
272
+ ```json
273
+ {
274
+ "scale": 20.0,
275
+ "similarity_fct": "cos_sim"
276
+ }
277
+ ```
278
+
279
+ ### Training Hyperparameters
280
+ #### Non-Default Hyperparameters
281
+
282
+ - `eval_strategy`: epoch
283
+ - `per_device_train_batch_size`: 4
284
+ - `per_device_eval_batch_size`: 4
285
+ - `num_train_epochs`: 10
286
+ - `warmup_ratio`: 0.1
287
+ - `bf16`: True
288
+ - `ddp_find_unused_parameters`: True
289
+ - `batch_sampler`: no_duplicates
290
+
291
+ #### All Hyperparameters
292
+ <details><summary>Click to expand</summary>
293
+
294
+ - `overwrite_output_dir`: False
295
+ - `do_predict`: False
296
+ - `eval_strategy`: epoch
297
+ - `prediction_loss_only`: True
298
+ - `per_device_train_batch_size`: 4
299
+ - `per_device_eval_batch_size`: 4
300
+ - `per_gpu_train_batch_size`: None
301
+ - `per_gpu_eval_batch_size`: None
302
+ - `gradient_accumulation_steps`: 1
303
+ - `eval_accumulation_steps`: None
304
+ - `torch_empty_cache_steps`: None
305
+ - `learning_rate`: 5e-05
306
+ - `weight_decay`: 0.0
307
+ - `adam_beta1`: 0.9
308
+ - `adam_beta2`: 0.999
309
+ - `adam_epsilon`: 1e-08
310
+ - `max_grad_norm`: 1.0
311
+ - `num_train_epochs`: 10
312
+ - `max_steps`: -1
313
+ - `lr_scheduler_type`: linear
314
+ - `lr_scheduler_kwargs`: {}
315
+ - `warmup_ratio`: 0.1
316
+ - `warmup_steps`: 0
317
+ - `log_level`: passive
318
+ - `log_level_replica`: warning
319
+ - `log_on_each_node`: True
320
+ - `logging_nan_inf_filter`: True
321
+ - `save_safetensors`: True
322
+ - `save_on_each_node`: False
323
+ - `save_only_model`: False
324
+ - `restore_callback_states_from_checkpoint`: False
325
+ - `no_cuda`: False
326
+ - `use_cpu`: False
327
+ - `use_mps_device`: False
328
+ - `seed`: 42
329
+ - `data_seed`: None
330
+ - `jit_mode_eval`: False
331
+ - `use_ipex`: False
332
+ - `bf16`: True
333
+ - `fp16`: False
334
+ - `fp16_opt_level`: O1
335
+ - `half_precision_backend`: auto
336
+ - `bf16_full_eval`: False
337
+ - `fp16_full_eval`: False
338
+ - `tf32`: None
339
+ - `local_rank`: 0
340
+ - `ddp_backend`: None
341
+ - `tpu_num_cores`: None
342
+ - `tpu_metrics_debug`: False
343
+ - `debug`: []
344
+ - `dataloader_drop_last`: True
345
+ - `dataloader_num_workers`: 0
346
+ - `dataloader_prefetch_factor`: None
347
+ - `past_index`: -1
348
+ - `disable_tqdm`: False
349
+ - `remove_unused_columns`: True
350
+ - `label_names`: None
351
+ - `load_best_model_at_end`: False
352
+ - `ignore_data_skip`: False
353
+ - `fsdp`: []
354
+ - `fsdp_min_num_params`: 0
355
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
356
+ - `fsdp_transformer_layer_cls_to_wrap`: None
357
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
358
+ - `deepspeed`: None
359
+ - `label_smoothing_factor`: 0.0
360
+ - `optim`: adamw_torch
361
+ - `optim_args`: None
362
+ - `adafactor`: False
363
+ - `group_by_length`: False
364
+ - `length_column_name`: length
365
+ - `ddp_find_unused_parameters`: True
366
+ - `ddp_bucket_cap_mb`: None
367
+ - `ddp_broadcast_buffers`: False
368
+ - `dataloader_pin_memory`: True
369
+ - `dataloader_persistent_workers`: False
370
+ - `skip_memory_metrics`: True
371
+ - `use_legacy_prediction_loop`: False
372
+ - `push_to_hub`: False
373
+ - `resume_from_checkpoint`: None
374
+ - `hub_model_id`: None
375
+ - `hub_strategy`: every_save
376
+ - `hub_private_repo`: False
377
+ - `hub_always_push`: False
378
+ - `gradient_checkpointing`: False
379
+ - `gradient_checkpointing_kwargs`: None
380
+ - `include_inputs_for_metrics`: False
381
+ - `eval_do_concat_batches`: True
382
+ - `fp16_backend`: auto
383
+ - `push_to_hub_model_id`: None
384
+ - `push_to_hub_organization`: None
385
+ - `mp_parameters`:
386
+ - `auto_find_batch_size`: False
387
+ - `full_determinism`: False
388
+ - `torchdynamo`: None
389
+ - `ray_scope`: last
390
+ - `ddp_timeout`: 1800
391
+ - `torch_compile`: False
392
+ - `torch_compile_backend`: None
393
+ - `torch_compile_mode`: None
394
+ - `dispatch_batches`: None
395
+ - `split_batches`: None
396
+ - `include_tokens_per_second`: False
397
+ - `include_num_input_tokens_seen`: False
398
+ - `neftune_noise_alpha`: None
399
+ - `optim_target_modules`: None
400
+ - `batch_eval_metrics`: False
401
+ - `eval_on_start`: False
402
+ - `use_liger_kernel`: False
403
+ - `eval_use_gather_object`: False
404
+ - `batch_sampler`: no_duplicates
405
+ - `multi_dataset_batch_sampler`: proportional
406
+
407
+ </details>
408
+
409
+ ### Training Logs
410
+ | Epoch | Step | Training Loss | loss | GIST-small-Embedding-v0-4_batch_10_epoch_all_data_en_unique_split_robustness_42_eval_cosine_accuracy |
411
+ |:------:|:----:|:-------------:|:------:|:----------------------------------------------------------------------------------------------------:|
412
+ | 0.9977 | 425 | 1.7795 | 1.4178 | 0.8036 |
413
+ | 1.9977 | 850 | 1.2852 | 1.1081 | 0.8591 |
414
+ | 2.9977 | 1275 | 1.0536 | 1.0428 | 0.8698 |
415
+ | 3.9977 | 1700 | 0.9389 | 1.0188 | 0.8741 |
416
+ | 4.9977 | 2125 | 0.8879 | 1.0129 | 0.8709 |
417
+ | 5.9977 | 2550 | 0.8557 | 1.0079 | 0.8698 |
418
+ | 6.9977 | 2975 | 0.8355 | 1.0076 | 0.8719 |
419
+ | 7.9977 | 3400 | 0.8151 | 1.0067 | 0.8751 |
420
+ | 8.9977 | 3825 | 0.8228 | 1.0065 | 0.8751 |
421
+ | 9.9977 | 4250 | 0.8174 | 1.0067 | 0.8762 |
422
+
423
+
424
+ ### Framework Versions
425
+ - Python: 3.10.14
426
+ - Sentence Transformers: 3.1.0
427
+ - Transformers: 4.45.1
428
+ - PyTorch: 2.4.1+cu121
429
+ - Accelerate: 0.34.2
430
+ - Datasets: 3.0.1
431
+ - Tokenizers: 0.20.0
432
+
433
+ ## Citation
434
+
435
+ ### BibTeX
436
+
437
+ #### Sentence Transformers
438
+ ```bibtex
439
+ @inproceedings{reimers-2019-sentence-bert,
440
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
441
+ author = "Reimers, Nils and Gurevych, Iryna",
442
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
443
+ month = "11",
444
+ year = "2019",
445
+ publisher = "Association for Computational Linguistics",
446
+ url = "https://arxiv.org/abs/1908.10084",
447
+ }
448
+ ```
449
+
450
+ #### MultipleNegativesRankingLoss
451
+ ```bibtex
452
+ @misc{henderson2017efficient,
453
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
454
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
455
+ year={2017},
456
+ eprint={1705.00652},
457
+ archivePrefix={arXiv},
458
+ primaryClass={cs.CL}
459
+ }
460
+ ```
461
+
462
+ <!--
463
+ ## Glossary
464
+
465
+ *Clearly define terms in order to be accessible across audiences.*
466
+ -->
467
+
468
+ <!--
469
+ ## Model Card Authors
470
+
471
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
472
+ -->
473
+
474
+ <!--
475
+ ## Model Card Contact
476
+
477
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
478
+ -->
config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "avsolatorio/GIST-small-Embedding-v0",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "id2label": {
12
+ "0": "LABEL_0"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 1536,
16
+ "label2id": {
17
+ "LABEL_0": 0
18
+ },
19
+ "layer_norm_eps": 1e-12,
20
+ "max_position_embeddings": 512,
21
+ "model_type": "bert",
22
+ "num_attention_heads": 12,
23
+ "num_hidden_layers": 12,
24
+ "pad_token_id": 0,
25
+ "position_embedding_type": "absolute",
26
+ "torch_dtype": "bfloat16",
27
+ "transformers_version": "4.45.1",
28
+ "type_vocab_size": 2,
29
+ "use_cache": true,
30
+ "vocab_size": 30522
31
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.1.0",
4
+ "transformers": "4.45.1",
5
+ "pytorch": "2.4.1+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d990b6ef6d2741ade04d1fd25e7acf7ac756a3ea52e626af22c15be5b6fa3872
3
+ size 66742184
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
runs/Oct28_13-36-18_7fc723fca212/events.out.tfevents.1730122584.7fc723fca212.223.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9cc1bcbb81b8294e60eda38b7d9c4d33256644af1f6cce11e7dcfde26a9eae85
3
+ size 16860
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:941b26cae7eea7bf38d1f26b5ebfcdfdae7b7d832c3f3c8ba7dc99921b254ed7
3
+ size 5688
vocab.txt ADDED
The diff for this file is too large to render. See raw diff