tomaarsen HF staff commited on
Commit
e88732e
1 Parent(s): 7a69b0e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +552 -546
README.md CHANGED
@@ -1,547 +1,553 @@
1
- ---
2
- language:
3
- - en
4
- license: apache-2.0
5
- library_name: sentence-transformers
6
- tags:
7
- - sentence-transformers
8
- - sentence-similarity
9
- - feature-extraction
10
- - 100K<n<1M
11
- - loss:MultipleNegativesRankingLoss
12
- base_model: microsoft/mpnet-base
13
- metrics:
14
- - cosine_accuracy
15
- - dot_accuracy
16
- - manhattan_accuracy
17
- - euclidean_accuracy
18
- - max_accuracy
19
- widget:
20
- - source_sentence: The truth?
21
- sentences:
22
- - Is that true?
23
- - Two kids are outdoors.
24
- - The dog is sleeping.
25
- - source_sentence: It did not.
26
- sentences:
27
- - It is not, of course.
28
- - The boy is in the sand.
29
- - Men are napping.
30
- - source_sentence: Impossible.
31
- sentences:
32
- - Entirely possible.
33
- - The people are athletes
34
- - The young man is sleeping.
35
- - source_sentence: Just a bike
36
- sentences:
37
- - A person on a bike
38
- - yeah i can believe that
39
- - The man is wearing jeans.
40
- - source_sentence: Then he ran.
41
- sentences:
42
- - The people are running.
43
- - The man is on his bike.
44
- - The boy is sleeping.
45
- pipeline_tag: sentence-similarity
46
- co2_eq_emissions:
47
- emissions: 118.81134392463773
48
- energy_consumed: 0.30566177669432554
49
- source: codecarbon
50
- training_type: fine-tuning
51
- on_cloud: false
52
- cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
53
- ram_total_size: 31.777088165283203
54
- hours_used: 1.661
55
- hardware_used: 1 x NVIDIA GeForce RTX 3090
56
- model-index:
57
- - name: MPNet base trained on AllNLI triplets
58
- results:
59
- - task:
60
- type: triplet
61
- name: Triplet
62
- dataset:
63
- name: all nli dev
64
- type: all-nli-dev
65
- metrics:
66
- - type: cosine_accuracy
67
- value: 0.9003645200486027
68
- name: Cosine Accuracy
69
- - type: dot_accuracy
70
- value: 0.09705346294046173
71
- name: Dot Accuracy
72
- - type: manhattan_accuracy
73
- value: 0.8968712029161604
74
- name: Manhattan Accuracy
75
- - type: euclidean_accuracy
76
- value: 0.8974787363304981
77
- name: Euclidean Accuracy
78
- - type: max_accuracy
79
- value: 0.9003645200486027
80
- name: Max Accuracy
81
- - task:
82
- type: triplet
83
- name: Triplet
84
- dataset:
85
- name: all nli test
86
- type: all-nli-test
87
- metrics:
88
- - type: cosine_accuracy
89
- value: 0.9149644424269935
90
- name: Cosine Accuracy
91
- - type: dot_accuracy
92
- value: 0.08564079285822364
93
- name: Dot Accuracy
94
- - type: manhattan_accuracy
95
- value: 0.911484339536995
96
- name: Manhattan Accuracy
97
- - type: euclidean_accuracy
98
- value: 0.9134513542139506
99
- name: Euclidean Accuracy
100
- - type: max_accuracy
101
- value: 0.9149644424269935
102
- name: Max Accuracy
103
- ---
104
-
105
- # MPNet base trained on AllNLI triplets
106
-
107
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/mpnet-base](https://huggingface.co/microsoft/mpnet-base) on the [sentence-transformers/all-nli](https://huggingface.co/datasets/sentence-transformers/all-nli) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
108
-
109
- ## Model Details
110
-
111
- ### Model Description
112
- - **Model Type:** Sentence Transformer
113
- - **Base model:** [microsoft/mpnet-base](https://huggingface.co/microsoft/mpnet-base) <!-- at revision 6996ce1e91bd2a9c7d7f61daec37463394f73f09 -->
114
- - **Maximum Sequence Length:** 512 tokens
115
- - **Output Dimensionality:** 768 tokens
116
- - **Similarity Function:** Cosine Similarity
117
- - **Training Dataset:**
118
- - [sentence-transformers/all-nli](https://huggingface.co/datasets/sentence-transformers/all-nli)
119
- - **Language:** en
120
- - **License:** apache-2.0
121
-
122
- ### Model Sources
123
-
124
- - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
125
- - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
126
- - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
127
-
128
- ### Full Model Architecture
129
-
130
- ```
131
- SentenceTransformer(
132
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel
133
- (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
134
- )
135
- ```
136
-
137
- ## Usage
138
-
139
- ### Direct Usage (Sentence Transformers)
140
-
141
- First install the Sentence Transformers library:
142
-
143
- ```bash
144
- pip install -U sentence-transformers
145
- ```
146
-
147
- Then you can load this model and run inference.
148
- ```python
149
- from sentence_transformers import SentenceTransformer
150
-
151
- # Download from the 🤗 Hub
152
- model = SentenceTransformer("tomaarsen/mpnet-base-all-nli-triplet")
153
- # Run inference
154
- sentences = [
155
- 'Then he ran.',
156
- 'The people are running.',
157
- 'The man is on his bike.',
158
- ]
159
- embeddings = model.encode(sentences)
160
- print(embeddings.shape)
161
- # [3, 768]
162
-
163
- # Get the similarity scores for the embeddings
164
- similarities = model.similarity(embeddings, embeddings)
165
- print(similarities.shape)
166
- # [3, 3]
167
- ```
168
-
169
- <!--
170
- ### Direct Usage (Transformers)
171
-
172
- <details><summary>Click to see the direct usage in Transformers</summary>
173
-
174
- </details>
175
- -->
176
-
177
- <!--
178
- ### Downstream Usage (Sentence Transformers)
179
-
180
- You can finetune this model on your own dataset.
181
-
182
- <details><summary>Click to expand</summary>
183
-
184
- </details>
185
- -->
186
-
187
- <!--
188
- ### Out-of-Scope Use
189
-
190
- *List how the model may foreseeably be misused and address what users ought not to do with the model.*
191
- -->
192
-
193
- ## Evaluation
194
-
195
- ### Metrics
196
-
197
- #### Triplet
198
- * Dataset: `all-nli-dev`
199
- * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
200
-
201
- | Metric | Value |
202
- |:-------------------|:-----------|
203
- | cosine_accuracy | 0.9004 |
204
- | dot_accuracy | 0.0971 |
205
- | manhattan_accuracy | 0.8969 |
206
- | euclidean_accuracy | 0.8975 |
207
- | **max_accuracy** | **0.9004** |
208
-
209
- #### Triplet
210
- * Dataset: `all-nli-test`
211
- * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
212
-
213
- | Metric | Value |
214
- |:-------------------|:----------|
215
- | cosine_accuracy | 0.915 |
216
- | dot_accuracy | 0.0856 |
217
- | manhattan_accuracy | 0.9115 |
218
- | euclidean_accuracy | 0.9135 |
219
- | **max_accuracy** | **0.915** |
220
-
221
- <!--
222
- ## Bias, Risks and Limitations
223
-
224
- *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
225
- -->
226
-
227
- <!--
228
- ### Recommendations
229
-
230
- *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
231
- -->
232
-
233
- ## Training Details
234
-
235
- ### Training Dataset
236
-
237
- #### sentence-transformers/all-nli
238
-
239
- * Dataset: [sentence-transformers/all-nli](https://huggingface.co/datasets/sentence-transformers/all-nli) at [d482672](https://huggingface.co/datasets/sentence-transformers/all-nli/tree/d482672c8e74ce18da116f430137434ba2e52fab)
240
- * Size: 100,000 training samples
241
- * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
242
- * Approximate statistics based on the first 1000 samples:
243
- | | anchor | positive | negative |
244
- |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
245
- | type | string | string | string |
246
- | details | <ul><li>min: 7 tokens</li><li>mean: 10.46 tokens</li><li>max: 46 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 12.81 tokens</li><li>max: 40 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 13.4 tokens</li><li>max: 50 tokens</li></ul> |
247
- * Samples:
248
- | anchor | positive | negative |
249
- |:---------------------------------------------------------------------------|:-------------------------------------------------|:-----------------------------------------------------------|
250
- | <code>A person on a horse jumps over a broken down airplane.</code> | <code>A person is outdoors, on a horse.</code> | <code>A person is at a diner, ordering an omelette.</code> |
251
- | <code>Children smiling and waving at camera</code> | <code>There are children present</code> | <code>The kids are frowning</code> |
252
- | <code>A boy is jumping on skateboard in the middle of a red bridge.</code> | <code>The boy does a skateboarding trick.</code> | <code>The boy skates down the sidewalk.</code> |
253
- * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
254
- ```json
255
- {
256
- "scale": 20.0,
257
- "similarity_fct": "cos_sim"
258
- }
259
- ```
260
-
261
- ### Evaluation Dataset
262
-
263
- #### sentence-transformers/all-nli
264
-
265
- * Dataset: [sentence-transformers/all-nli](https://huggingface.co/datasets/sentence-transformers/all-nli) at [d482672](https://huggingface.co/datasets/sentence-transformers/all-nli/tree/d482672c8e74ce18da116f430137434ba2e52fab)
266
- * Size: 6,584 evaluation samples
267
- * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
268
- * Approximate statistics based on the first 1000 samples:
269
- | | anchor | positive | negative |
270
- |:--------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
271
- | type | string | string | string |
272
- | details | <ul><li>min: 6 tokens</li><li>mean: 17.95 tokens</li><li>max: 63 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 9.78 tokens</li><li>max: 29 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 10.35 tokens</li><li>max: 29 tokens</li></ul> |
273
- * Samples:
274
- | anchor | positive | negative |
275
- |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------|:--------------------------------------------------------|
276
- | <code>Two women are embracing while holding to go packages.</code> | <code>Two woman are holding packages.</code> | <code>The men are fighting outside a deli.</code> |
277
- | <code>Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink.</code> | <code>Two kids in numbered jerseys wash their hands.</code> | <code>Two kids in jackets walk to school.</code> |
278
- | <code>A man selling donuts to a customer during a world exhibition event held in the city of Angeles</code> | <code>A man selling donuts to a customer.</code> | <code>A woman drinks her coffee in a small cafe.</code> |
279
- * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
280
- ```json
281
- {
282
- "scale": 20.0,
283
- "similarity_fct": "cos_sim"
284
- }
285
- ```
286
-
287
- ### Training Hyperparameters
288
- #### Non-Default Hyperparameters
289
-
290
- - `eval_strategy`: steps
291
- - `per_device_train_batch_size`: 16
292
- - `per_device_eval_batch_size`: 16
293
- - `num_train_epochs`: 1
294
- - `warmup_ratio`: 0.1
295
- - `fp16`: True
296
- - `batch_sampler`: no_duplicates
297
-
298
- #### All Hyperparameters
299
- <details><summary>Click to expand</summary>
300
-
301
- - `overwrite_output_dir`: False
302
- - `do_predict`: False
303
- - `eval_strategy`: steps
304
- - `prediction_loss_only`: True
305
- - `per_device_train_batch_size`: 16
306
- - `per_device_eval_batch_size`: 16
307
- - `per_gpu_train_batch_size`: None
308
- - `per_gpu_eval_batch_size`: None
309
- - `gradient_accumulation_steps`: 1
310
- - `eval_accumulation_steps`: None
311
- - `learning_rate`: 5e-05
312
- - `weight_decay`: 0.0
313
- - `adam_beta1`: 0.9
314
- - `adam_beta2`: 0.999
315
- - `adam_epsilon`: 1e-08
316
- - `max_grad_norm`: 1.0
317
- - `num_train_epochs`: 1
318
- - `max_steps`: -1
319
- - `lr_scheduler_type`: linear
320
- - `lr_scheduler_kwargs`: {}
321
- - `warmup_ratio`: 0.1
322
- - `warmup_steps`: 0
323
- - `log_level`: passive
324
- - `log_level_replica`: warning
325
- - `log_on_each_node`: True
326
- - `logging_nan_inf_filter`: True
327
- - `save_safetensors`: True
328
- - `save_on_each_node`: False
329
- - `save_only_model`: False
330
- - `restore_callback_states_from_checkpoint`: False
331
- - `no_cuda`: False
332
- - `use_cpu`: False
333
- - `use_mps_device`: False
334
- - `seed`: 42
335
- - `data_seed`: None
336
- - `jit_mode_eval`: False
337
- - `use_ipex`: False
338
- - `bf16`: False
339
- - `fp16`: True
340
- - `fp16_opt_level`: O1
341
- - `half_precision_backend`: auto
342
- - `bf16_full_eval`: False
343
- - `fp16_full_eval`: False
344
- - `tf32`: None
345
- - `local_rank`: 0
346
- - `ddp_backend`: None
347
- - `tpu_num_cores`: None
348
- - `tpu_metrics_debug`: False
349
- - `debug`: []
350
- - `dataloader_drop_last`: False
351
- - `dataloader_num_workers`: 0
352
- - `dataloader_prefetch_factor`: None
353
- - `past_index`: -1
354
- - `disable_tqdm`: False
355
- - `remove_unused_columns`: True
356
- - `label_names`: None
357
- - `load_best_model_at_end`: False
358
- - `ignore_data_skip`: False
359
- - `fsdp`: []
360
- - `fsdp_min_num_params`: 0
361
- - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
362
- - `fsdp_transformer_layer_cls_to_wrap`: None
363
- - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
364
- - `deepspeed`: None
365
- - `label_smoothing_factor`: 0.0
366
- - `optim`: adamw_torch
367
- - `optim_args`: None
368
- - `adafactor`: False
369
- - `group_by_length`: False
370
- - `length_column_name`: length
371
- - `ddp_find_unused_parameters`: None
372
- - `ddp_bucket_cap_mb`: None
373
- - `ddp_broadcast_buffers`: False
374
- - `dataloader_pin_memory`: True
375
- - `dataloader_persistent_workers`: False
376
- - `skip_memory_metrics`: True
377
- - `use_legacy_prediction_loop`: False
378
- - `push_to_hub`: False
379
- - `resume_from_checkpoint`: None
380
- - `hub_model_id`: None
381
- - `hub_strategy`: every_save
382
- - `hub_private_repo`: False
383
- - `hub_always_push`: False
384
- - `gradient_checkpointing`: False
385
- - `gradient_checkpointing_kwargs`: None
386
- - `include_inputs_for_metrics`: False
387
- - `eval_do_concat_batches`: True
388
- - `fp16_backend`: auto
389
- - `push_to_hub_model_id`: None
390
- - `push_to_hub_organization`: None
391
- - `mp_parameters`:
392
- - `auto_find_batch_size`: False
393
- - `full_determinism`: False
394
- - `torchdynamo`: None
395
- - `ray_scope`: last
396
- - `ddp_timeout`: 1800
397
- - `torch_compile`: False
398
- - `torch_compile_backend`: None
399
- - `torch_compile_mode`: None
400
- - `dispatch_batches`: None
401
- - `split_batches`: None
402
- - `include_tokens_per_second`: False
403
- - `include_num_input_tokens_seen`: False
404
- - `neftune_noise_alpha`: None
405
- - `optim_target_modules`: None
406
- - `batch_eval_metrics`: False
407
- - `batch_sampler`: no_duplicates
408
- - `multi_dataset_batch_sampler`: proportional
409
-
410
- </details>
411
-
412
- ### Training Logs
413
- | Epoch | Step | Training Loss | loss | all-nli-dev_max_accuracy | all-nli-test_max_accuracy |
414
- |:-----:|:----:|:-------------:|:------:|:------------------------:|:-------------------------:|
415
- | 0 | 0 | - | - | 0.6832 | - |
416
- | 0.016 | 100 | 2.6355 | 1.0725 | 0.7924 | - |
417
- | 0.032 | 200 | 0.9206 | 0.8342 | 0.8080 | - |
418
- | 0.048 | 300 | 1.2567 | 0.7855 | 0.8133 | - |
419
- | 0.064 | 400 | 0.7949 | 0.8857 | 0.7974 | - |
420
- | 0.08 | 500 | 0.7583 | 0.9487 | 0.7872 | - |
421
- | 0.096 | 600 | 1.0022 | 1.1312 | 0.7848 | - |
422
- | 0.112 | 700 | 0.8178 | 1.2282 | 0.7895 | - |
423
- | 0.128 | 800 | 0.9997 | 1.5132 | 0.7488 | - |
424
- | 0.144 | 900 | 1.1173 | 1.4605 | 0.7473 | - |
425
- | 0.16 | 1000 | 1.0089 | 1.3794 | 0.7543 | - |
426
- | 0.176 | 1100 | 1.0235 | 1.4188 | 0.7640 | - |
427
- | 0.192 | 1200 | 1.0031 | 1.2465 | 0.7570 | - |
428
- | 0.208 | 1300 | 0.8286 | 1.4176 | 0.7426 | - |
429
- | 0.224 | 1400 | 0.8411 | 1.1914 | 0.7600 | - |
430
- | 0.24 | 1500 | 0.8389 | 1.1719 | 0.7820 | - |
431
- | 0.256 | 1600 | 0.7144 | 1.1167 | 0.7691 | - |
432
- | 0.272 | 1700 | 0.881 | 1.0747 | 0.7902 | - |
433
- | 0.288 | 1800 | 0.8657 | 1.1576 | 0.7966 | - |
434
- | 0.304 | 1900 | 0.7323 | 1.0122 | 0.8322 | - |
435
- | 0.32 | 2000 | 0.6578 | 1.1248 | 0.8273 | - |
436
- | 0.336 | 2100 | 0.6037 | 1.1194 | 0.8269 | - |
437
- | 0.352 | 2200 | 0.641 | 1.1410 | 0.8341 | - |
438
- | 0.368 | 2300 | 0.7843 | 1.0600 | 0.8328 | - |
439
- | 0.384 | 2400 | 0.8222 | 0.9988 | 0.8161 | - |
440
- | 0.4 | 2500 | 0.7287 | 1.2026 | 0.8395 | - |
441
- | 0.416 | 2600 | 0.6035 | 0.8802 | 0.8273 | - |
442
- | 0.432 | 2700 | 0.8275 | 1.1631 | 0.8458 | - |
443
- | 0.448 | 2800 | 0.8483 | 0.9218 | 0.8316 | - |
444
- | 0.464 | 2900 | 0.8813 | 1.1187 | 0.8147 | - |
445
- | 0.48 | 3000 | 0.7408 | 0.9582 | 0.8246 | - |
446
- | 0.496 | 3100 | 0.7886 | 0.9364 | 0.8261 | - |
447
- | 0.512 | 3200 | 0.6064 | 0.8338 | 0.8302 | - |
448
- | 0.528 | 3300 | 0.6415 | 0.7895 | 0.8650 | - |
449
- | 0.544 | 3400 | 0.5766 | 0.7525 | 0.8571 | - |
450
- | 0.56 | 3500 | 0.6212 | 0.8605 | 0.8572 | - |
451
- | 0.576 | 3600 | 0.5773 | 0.7460 | 0.8419 | - |
452
- | 0.592 | 3700 | 0.6104 | 0.7480 | 0.8580 | - |
453
- | 0.608 | 3800 | 0.5754 | 0.7215 | 0.8657 | - |
454
- | 0.624 | 3900 | 0.5525 | 0.7900 | 0.8630 | - |
455
- | 0.64 | 4000 | 0.7802 | 0.7443 | 0.8612 | - |
456
- | 0.656 | 4100 | 0.9796 | 0.7756 | 0.8748 | - |
457
- | 0.672 | 4200 | 0.9355 | 0.6917 | 0.8796 | - |
458
- | 0.688 | 4300 | 0.7081 | 0.6442 | 0.8832 | - |
459
- | 0.704 | 4400 | 0.6868 | 0.6395 | 0.8891 | - |
460
- | 0.72 | 4500 | 0.5964 | 0.5983 | 0.8820 | - |
461
- | 0.736 | 4600 | 0.6618 | 0.5754 | 0.8861 | - |
462
- | 0.752 | 4700 | 0.6957 | 0.6177 | 0.8803 | - |
463
- | 0.768 | 4800 | 0.6375 | 0.5577 | 0.8881 | - |
464
- | 0.784 | 4900 | 0.5481 | 0.5496 | 0.8835 | - |
465
- | 0.8 | 5000 | 0.6626 | 0.5728 | 0.8949 | - |
466
- | 0.816 | 5100 | 0.5192 | 0.5329 | 0.8935 | - |
467
- | 0.832 | 5200 | 0.5856 | 0.5188 | 0.8935 | - |
468
- | 0.848 | 5300 | 0.5142 | 0.5252 | 0.8920 | - |
469
- | 0.864 | 5400 | 0.6404 | 0.5641 | 0.8885 | - |
470
- | 0.88 | 5500 | 0.5466 | 0.5209 | 0.8929 | - |
471
- | 0.896 | 5600 | 0.575 | 0.5170 | 0.8961 | - |
472
- | 0.912 | 5700 | 0.626 | 0.5095 | 0.9001 | - |
473
- | 0.928 | 5800 | 0.5631 | 0.4817 | 0.8984 | - |
474
- | 0.944 | 5900 | 0.7301 | 0.4996 | 0.8984 | - |
475
- | 0.96 | 6000 | 0.7712 | 0.5160 | 0.9014 | - |
476
- | 0.976 | 6100 | 0.6203 | 0.5000 | 0.9007 | - |
477
- | 0.992 | 6200 | 0.0005 | 0.4996 | 0.9004 | - |
478
- | 1.0 | 6250 | - | - | - | 0.9150 |
479
-
480
-
481
- ### Environmental Impact
482
- Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
483
- - **Energy Consumed**: 0.306 kWh
484
- - **Carbon Emitted**: 0.119 kg of CO2
485
- - **Hours Used**: 1.661 hours
486
-
487
- ### Training Hardware
488
- - **On Cloud**: No
489
- - **GPU Model**: 1 x NVIDIA GeForce RTX 3090
490
- - **CPU Model**: 13th Gen Intel(R) Core(TM) i7-13700K
491
- - **RAM Size**: 31.78 GB
492
-
493
- ### Framework Versions
494
- - Python: 3.11.6
495
- - Sentence Transformers: 3.0.0.dev0
496
- - Transformers: 4.41.1
497
- - PyTorch: 2.3.0+cu121
498
- - Accelerate: 0.30.1
499
- - Datasets: 2.19.1
500
- - Tokenizers: 0.19.1
501
-
502
- ## Citation
503
-
504
- ### BibTeX
505
-
506
- #### Sentence Transformers
507
- ```bibtex
508
- @inproceedings{reimers-2019-sentence-bert,
509
- title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
510
- author = "Reimers, Nils and Gurevych, Iryna",
511
- booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
512
- month = "11",
513
- year = "2019",
514
- publisher = "Association for Computational Linguistics",
515
- url = "https://arxiv.org/abs/1908.10084",
516
- }
517
- ```
518
-
519
- #### MultipleNegativesRankingLoss
520
- ```bibtex
521
- @misc{henderson2017efficient,
522
- title={Efficient Natural Language Response Suggestion for Smart Reply},
523
- author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
524
- year={2017},
525
- eprint={1705.00652},
526
- archivePrefix={arXiv},
527
- primaryClass={cs.CL}
528
- }
529
- ```
530
-
531
- <!--
532
- ## Glossary
533
-
534
- *Clearly define terms in order to be accessible across audiences.*
535
- -->
536
-
537
- <!--
538
- ## Model Card Authors
539
-
540
- *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
541
- -->
542
-
543
- <!--
544
- ## Model Card Contact
545
-
546
- *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
 
 
 
 
 
 
547
  -->
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ library_name: sentence-transformers
6
+ tags:
7
+ - sentence-transformers
8
+ - sentence-similarity
9
+ - feature-extraction
10
+ - 100K<n<1M
11
+ - loss:MultipleNegativesRankingLoss
12
+ base_model: microsoft/mpnet-base
13
+ metrics:
14
+ - cosine_accuracy
15
+ - dot_accuracy
16
+ - manhattan_accuracy
17
+ - euclidean_accuracy
18
+ - max_accuracy
19
+ widget:
20
+ - source_sentence: The strangely dressed guys, one wearing an orange wig, sunglasses
21
+ with peace signs, and a karate costume with an orannge belt, another wearing a
22
+ curly blue wig, heart shaped sunglasses, and a karate outfit painted with leaves,
23
+ and the third wearing pink underwear, a black afro, and giant sunglasses.
24
+ sentences:
25
+ - A blonde female is reaching into a golf hole while holding two golf balls.
26
+ - There are people wearing outfits.
27
+ - The people are naked.
28
+ - source_sentence: A group of children playing and having a good time.
29
+ sentences:
30
+ - The kids are together.
31
+ - The children are reading books.
32
+ - People are pointing at a Middle-aged woman.
33
+ - source_sentence: Three children dressed in winter clothes are walking through the
34
+ woods while pushing cargo along.
35
+ sentences:
36
+ - A woman is sitting.
37
+ - Three childre are dressed in summer clothes.
38
+ - Three children are dressed in winter clothes.
39
+ - source_sentence: A young child is enjoying the water and rock scenery with their
40
+ dog.
41
+ sentences:
42
+ - The child and dog are enjoying some fresh air.
43
+ - The teenage boy is taking his cat for a walk beside the water.
44
+ - A lady in blue has birds around her.
45
+ - source_sentence: 'Boca da Corrida Encumeada (moderate; 5 hours): views of Curral
46
+ das Freiras and the valley of Ribeiro do Poco.'
47
+ sentences:
48
+ - 'Boca da Corrida Encumeada is a moderate text that takes 5 hours to complete. '
49
+ - This chapter is in the advance category.
50
+ - I think it is something that we need.
51
+ pipeline_tag: sentence-similarity
52
+ co2_eq_emissions:
53
+ emissions: 118.81134392463773
54
+ energy_consumed: 0.30566177669432554
55
+ source: codecarbon
56
+ training_type: fine-tuning
57
+ on_cloud: false
58
+ cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
59
+ ram_total_size: 31.777088165283203
60
+ hours_used: 1.661
61
+ hardware_used: 1 x NVIDIA GeForce RTX 3090
62
+ model-index:
63
+ - name: MPNet base trained on AllNLI triplets
64
+ results:
65
+ - task:
66
+ type: triplet
67
+ name: Triplet
68
+ dataset:
69
+ name: all nli dev
70
+ type: all-nli-dev
71
+ metrics:
72
+ - type: cosine_accuracy
73
+ value: 0.9003645200486027
74
+ name: Cosine Accuracy
75
+ - type: dot_accuracy
76
+ value: 0.09705346294046173
77
+ name: Dot Accuracy
78
+ - type: manhattan_accuracy
79
+ value: 0.8968712029161604
80
+ name: Manhattan Accuracy
81
+ - type: euclidean_accuracy
82
+ value: 0.8974787363304981
83
+ name: Euclidean Accuracy
84
+ - type: max_accuracy
85
+ value: 0.9003645200486027
86
+ name: Max Accuracy
87
+ - task:
88
+ type: triplet
89
+ name: Triplet
90
+ dataset:
91
+ name: all nli test
92
+ type: all-nli-test
93
+ metrics:
94
+ - type: cosine_accuracy
95
+ value: 0.9149644424269935
96
+ name: Cosine Accuracy
97
+ - type: dot_accuracy
98
+ value: 0.08564079285822364
99
+ name: Dot Accuracy
100
+ - type: manhattan_accuracy
101
+ value: 0.911484339536995
102
+ name: Manhattan Accuracy
103
+ - type: euclidean_accuracy
104
+ value: 0.9134513542139506
105
+ name: Euclidean Accuracy
106
+ - type: max_accuracy
107
+ value: 0.9149644424269935
108
+ name: Max Accuracy
109
+ ---
110
+
111
+ # MPNet base trained on AllNLI triplets
112
+
113
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/mpnet-base](https://huggingface.co/microsoft/mpnet-base) on the [sentence-transformers/all-nli](https://huggingface.co/datasets/sentence-transformers/all-nli) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
114
+
115
+ ## Model Details
116
+
117
+ ### Model Description
118
+ - **Model Type:** Sentence Transformer
119
+ - **Base model:** [microsoft/mpnet-base](https://huggingface.co/microsoft/mpnet-base) <!-- at revision 6996ce1e91bd2a9c7d7f61daec37463394f73f09 -->
120
+ - **Maximum Sequence Length:** 512 tokens
121
+ - **Output Dimensionality:** 768 tokens
122
+ - **Similarity Function:** Cosine Similarity
123
+ - **Training Dataset:**
124
+ - [sentence-transformers/all-nli](https://huggingface.co/datasets/sentence-transformers/all-nli)
125
+ - **Language:** en
126
+ - **License:** apache-2.0
127
+
128
+ ### Model Sources
129
+
130
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
131
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
132
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
133
+
134
+ ### Full Model Architecture
135
+
136
+ ```
137
+ SentenceTransformer(
138
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel
139
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
140
+ )
141
+ ```
142
+
143
+ ## Usage
144
+
145
+ ### Direct Usage (Sentence Transformers)
146
+
147
+ First install the Sentence Transformers library:
148
+
149
+ ```bash
150
+ pip install -U sentence-transformers
151
+ ```
152
+
153
+ Then you can load this model and run inference.
154
+ ```python
155
+ from sentence_transformers import SentenceTransformer
156
+
157
+ # Download from the 🤗 Hub
158
+ model = SentenceTransformer("tomaarsen/mpnet-base-all-nli-triplet")
159
+ # Run inference
160
+ sentences = [
161
+ 'Then he ran.',
162
+ 'The people are running.',
163
+ 'The man is on his bike.',
164
+ ]
165
+ embeddings = model.encode(sentences)
166
+ print(embeddings.shape)
167
+ # [3, 768]
168
+
169
+ # Get the similarity scores for the embeddings
170
+ similarities = model.similarity(embeddings, embeddings)
171
+ print(similarities.shape)
172
+ # [3, 3]
173
+ ```
174
+
175
+ <!--
176
+ ### Direct Usage (Transformers)
177
+
178
+ <details><summary>Click to see the direct usage in Transformers</summary>
179
+
180
+ </details>
181
+ -->
182
+
183
+ <!--
184
+ ### Downstream Usage (Sentence Transformers)
185
+
186
+ You can finetune this model on your own dataset.
187
+
188
+ <details><summary>Click to expand</summary>
189
+
190
+ </details>
191
+ -->
192
+
193
+ <!--
194
+ ### Out-of-Scope Use
195
+
196
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
197
+ -->
198
+
199
+ ## Evaluation
200
+
201
+ ### Metrics
202
+
203
+ #### Triplet
204
+ * Dataset: `all-nli-dev`
205
+ * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
206
+
207
+ | Metric | Value |
208
+ |:-------------------|:-----------|
209
+ | cosine_accuracy | 0.9004 |
210
+ | dot_accuracy | 0.0971 |
211
+ | manhattan_accuracy | 0.8969 |
212
+ | euclidean_accuracy | 0.8975 |
213
+ | **max_accuracy** | **0.9004** |
214
+
215
+ #### Triplet
216
+ * Dataset: `all-nli-test`
217
+ * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
218
+
219
+ | Metric | Value |
220
+ |:-------------------|:----------|
221
+ | cosine_accuracy | 0.915 |
222
+ | dot_accuracy | 0.0856 |
223
+ | manhattan_accuracy | 0.9115 |
224
+ | euclidean_accuracy | 0.9135 |
225
+ | **max_accuracy** | **0.915** |
226
+
227
+ <!--
228
+ ## Bias, Risks and Limitations
229
+
230
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
231
+ -->
232
+
233
+ <!--
234
+ ### Recommendations
235
+
236
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
237
+ -->
238
+
239
+ ## Training Details
240
+
241
+ ### Training Dataset
242
+
243
+ #### sentence-transformers/all-nli
244
+
245
+ * Dataset: [sentence-transformers/all-nli](https://huggingface.co/datasets/sentence-transformers/all-nli) at [d482672](https://huggingface.co/datasets/sentence-transformers/all-nli/tree/d482672c8e74ce18da116f430137434ba2e52fab)
246
+ * Size: 100,000 training samples
247
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
248
+ * Approximate statistics based on the first 1000 samples:
249
+ | | anchor | positive | negative |
250
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
251
+ | type | string | string | string |
252
+ | details | <ul><li>min: 7 tokens</li><li>mean: 10.46 tokens</li><li>max: 46 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 12.81 tokens</li><li>max: 40 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 13.4 tokens</li><li>max: 50 tokens</li></ul> |
253
+ * Samples:
254
+ | anchor | positive | negative |
255
+ |:---------------------------------------------------------------------------|:-------------------------------------------------|:-----------------------------------------------------------|
256
+ | <code>A person on a horse jumps over a broken down airplane.</code> | <code>A person is outdoors, on a horse.</code> | <code>A person is at a diner, ordering an omelette.</code> |
257
+ | <code>Children smiling and waving at camera</code> | <code>There are children present</code> | <code>The kids are frowning</code> |
258
+ | <code>A boy is jumping on skateboard in the middle of a red bridge.</code> | <code>The boy does a skateboarding trick.</code> | <code>The boy skates down the sidewalk.</code> |
259
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
260
+ ```json
261
+ {
262
+ "scale": 20.0,
263
+ "similarity_fct": "cos_sim"
264
+ }
265
+ ```
266
+
267
+ ### Evaluation Dataset
268
+
269
+ #### sentence-transformers/all-nli
270
+
271
+ * Dataset: [sentence-transformers/all-nli](https://huggingface.co/datasets/sentence-transformers/all-nli) at [d482672](https://huggingface.co/datasets/sentence-transformers/all-nli/tree/d482672c8e74ce18da116f430137434ba2e52fab)
272
+ * Size: 6,584 evaluation samples
273
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
274
+ * Approximate statistics based on the first 1000 samples:
275
+ | | anchor | positive | negative |
276
+ |:--------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
277
+ | type | string | string | string |
278
+ | details | <ul><li>min: 6 tokens</li><li>mean: 17.95 tokens</li><li>max: 63 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 9.78 tokens</li><li>max: 29 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 10.35 tokens</li><li>max: 29 tokens</li></ul> |
279
+ * Samples:
280
+ | anchor | positive | negative |
281
+ |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------|:--------------------------------------------------------|
282
+ | <code>Two women are embracing while holding to go packages.</code> | <code>Two woman are holding packages.</code> | <code>The men are fighting outside a deli.</code> |
283
+ | <code>Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink.</code> | <code>Two kids in numbered jerseys wash their hands.</code> | <code>Two kids in jackets walk to school.</code> |
284
+ | <code>A man selling donuts to a customer during a world exhibition event held in the city of Angeles</code> | <code>A man selling donuts to a customer.</code> | <code>A woman drinks her coffee in a small cafe.</code> |
285
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
286
+ ```json
287
+ {
288
+ "scale": 20.0,
289
+ "similarity_fct": "cos_sim"
290
+ }
291
+ ```
292
+
293
+ ### Training Hyperparameters
294
+ #### Non-Default Hyperparameters
295
+
296
+ - `eval_strategy`: steps
297
+ - `per_device_train_batch_size`: 16
298
+ - `per_device_eval_batch_size`: 16
299
+ - `num_train_epochs`: 1
300
+ - `warmup_ratio`: 0.1
301
+ - `fp16`: True
302
+ - `batch_sampler`: no_duplicates
303
+
304
+ #### All Hyperparameters
305
+ <details><summary>Click to expand</summary>
306
+
307
+ - `overwrite_output_dir`: False
308
+ - `do_predict`: False
309
+ - `eval_strategy`: steps
310
+ - `prediction_loss_only`: True
311
+ - `per_device_train_batch_size`: 16
312
+ - `per_device_eval_batch_size`: 16
313
+ - `per_gpu_train_batch_size`: None
314
+ - `per_gpu_eval_batch_size`: None
315
+ - `gradient_accumulation_steps`: 1
316
+ - `eval_accumulation_steps`: None
317
+ - `learning_rate`: 5e-05
318
+ - `weight_decay`: 0.0
319
+ - `adam_beta1`: 0.9
320
+ - `adam_beta2`: 0.999
321
+ - `adam_epsilon`: 1e-08
322
+ - `max_grad_norm`: 1.0
323
+ - `num_train_epochs`: 1
324
+ - `max_steps`: -1
325
+ - `lr_scheduler_type`: linear
326
+ - `lr_scheduler_kwargs`: {}
327
+ - `warmup_ratio`: 0.1
328
+ - `warmup_steps`: 0
329
+ - `log_level`: passive
330
+ - `log_level_replica`: warning
331
+ - `log_on_each_node`: True
332
+ - `logging_nan_inf_filter`: True
333
+ - `save_safetensors`: True
334
+ - `save_on_each_node`: False
335
+ - `save_only_model`: False
336
+ - `restore_callback_states_from_checkpoint`: False
337
+ - `no_cuda`: False
338
+ - `use_cpu`: False
339
+ - `use_mps_device`: False
340
+ - `seed`: 42
341
+ - `data_seed`: None
342
+ - `jit_mode_eval`: False
343
+ - `use_ipex`: False
344
+ - `bf16`: False
345
+ - `fp16`: True
346
+ - `fp16_opt_level`: O1
347
+ - `half_precision_backend`: auto
348
+ - `bf16_full_eval`: False
349
+ - `fp16_full_eval`: False
350
+ - `tf32`: None
351
+ - `local_rank`: 0
352
+ - `ddp_backend`: None
353
+ - `tpu_num_cores`: None
354
+ - `tpu_metrics_debug`: False
355
+ - `debug`: []
356
+ - `dataloader_drop_last`: False
357
+ - `dataloader_num_workers`: 0
358
+ - `dataloader_prefetch_factor`: None
359
+ - `past_index`: -1
360
+ - `disable_tqdm`: False
361
+ - `remove_unused_columns`: True
362
+ - `label_names`: None
363
+ - `load_best_model_at_end`: False
364
+ - `ignore_data_skip`: False
365
+ - `fsdp`: []
366
+ - `fsdp_min_num_params`: 0
367
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
368
+ - `fsdp_transformer_layer_cls_to_wrap`: None
369
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
370
+ - `deepspeed`: None
371
+ - `label_smoothing_factor`: 0.0
372
+ - `optim`: adamw_torch
373
+ - `optim_args`: None
374
+ - `adafactor`: False
375
+ - `group_by_length`: False
376
+ - `length_column_name`: length
377
+ - `ddp_find_unused_parameters`: None
378
+ - `ddp_bucket_cap_mb`: None
379
+ - `ddp_broadcast_buffers`: False
380
+ - `dataloader_pin_memory`: True
381
+ - `dataloader_persistent_workers`: False
382
+ - `skip_memory_metrics`: True
383
+ - `use_legacy_prediction_loop`: False
384
+ - `push_to_hub`: False
385
+ - `resume_from_checkpoint`: None
386
+ - `hub_model_id`: None
387
+ - `hub_strategy`: every_save
388
+ - `hub_private_repo`: False
389
+ - `hub_always_push`: False
390
+ - `gradient_checkpointing`: False
391
+ - `gradient_checkpointing_kwargs`: None
392
+ - `include_inputs_for_metrics`: False
393
+ - `eval_do_concat_batches`: True
394
+ - `fp16_backend`: auto
395
+ - `push_to_hub_model_id`: None
396
+ - `push_to_hub_organization`: None
397
+ - `mp_parameters`:
398
+ - `auto_find_batch_size`: False
399
+ - `full_determinism`: False
400
+ - `torchdynamo`: None
401
+ - `ray_scope`: last
402
+ - `ddp_timeout`: 1800
403
+ - `torch_compile`: False
404
+ - `torch_compile_backend`: None
405
+ - `torch_compile_mode`: None
406
+ - `dispatch_batches`: None
407
+ - `split_batches`: None
408
+ - `include_tokens_per_second`: False
409
+ - `include_num_input_tokens_seen`: False
410
+ - `neftune_noise_alpha`: None
411
+ - `optim_target_modules`: None
412
+ - `batch_eval_metrics`: False
413
+ - `batch_sampler`: no_duplicates
414
+ - `multi_dataset_batch_sampler`: proportional
415
+
416
+ </details>
417
+
418
+ ### Training Logs
419
+ | Epoch | Step | Training Loss | loss | all-nli-dev_max_accuracy | all-nli-test_max_accuracy |
420
+ |:-----:|:----:|:-------------:|:------:|:------------------------:|:-------------------------:|
421
+ | 0 | 0 | - | - | 0.6832 | - |
422
+ | 0.016 | 100 | 2.6355 | 1.0725 | 0.7924 | - |
423
+ | 0.032 | 200 | 0.9206 | 0.8342 | 0.8080 | - |
424
+ | 0.048 | 300 | 1.2567 | 0.7855 | 0.8133 | - |
425
+ | 0.064 | 400 | 0.7949 | 0.8857 | 0.7974 | - |
426
+ | 0.08 | 500 | 0.7583 | 0.9487 | 0.7872 | - |
427
+ | 0.096 | 600 | 1.0022 | 1.1312 | 0.7848 | - |
428
+ | 0.112 | 700 | 0.8178 | 1.2282 | 0.7895 | - |
429
+ | 0.128 | 800 | 0.9997 | 1.5132 | 0.7488 | - |
430
+ | 0.144 | 900 | 1.1173 | 1.4605 | 0.7473 | - |
431
+ | 0.16 | 1000 | 1.0089 | 1.3794 | 0.7543 | - |
432
+ | 0.176 | 1100 | 1.0235 | 1.4188 | 0.7640 | - |
433
+ | 0.192 | 1200 | 1.0031 | 1.2465 | 0.7570 | - |
434
+ | 0.208 | 1300 | 0.8286 | 1.4176 | 0.7426 | - |
435
+ | 0.224 | 1400 | 0.8411 | 1.1914 | 0.7600 | - |
436
+ | 0.24 | 1500 | 0.8389 | 1.1719 | 0.7820 | - |
437
+ | 0.256 | 1600 | 0.7144 | 1.1167 | 0.7691 | - |
438
+ | 0.272 | 1700 | 0.881 | 1.0747 | 0.7902 | - |
439
+ | 0.288 | 1800 | 0.8657 | 1.1576 | 0.7966 | - |
440
+ | 0.304 | 1900 | 0.7323 | 1.0122 | 0.8322 | - |
441
+ | 0.32 | 2000 | 0.6578 | 1.1248 | 0.8273 | - |
442
+ | 0.336 | 2100 | 0.6037 | 1.1194 | 0.8269 | - |
443
+ | 0.352 | 2200 | 0.641 | 1.1410 | 0.8341 | - |
444
+ | 0.368 | 2300 | 0.7843 | 1.0600 | 0.8328 | - |
445
+ | 0.384 | 2400 | 0.8222 | 0.9988 | 0.8161 | - |
446
+ | 0.4 | 2500 | 0.7287 | 1.2026 | 0.8395 | - |
447
+ | 0.416 | 2600 | 0.6035 | 0.8802 | 0.8273 | - |
448
+ | 0.432 | 2700 | 0.8275 | 1.1631 | 0.8458 | - |
449
+ | 0.448 | 2800 | 0.8483 | 0.9218 | 0.8316 | - |
450
+ | 0.464 | 2900 | 0.8813 | 1.1187 | 0.8147 | - |
451
+ | 0.48 | 3000 | 0.7408 | 0.9582 | 0.8246 | - |
452
+ | 0.496 | 3100 | 0.7886 | 0.9364 | 0.8261 | - |
453
+ | 0.512 | 3200 | 0.6064 | 0.8338 | 0.8302 | - |
454
+ | 0.528 | 3300 | 0.6415 | 0.7895 | 0.8650 | - |
455
+ | 0.544 | 3400 | 0.5766 | 0.7525 | 0.8571 | - |
456
+ | 0.56 | 3500 | 0.6212 | 0.8605 | 0.8572 | - |
457
+ | 0.576 | 3600 | 0.5773 | 0.7460 | 0.8419 | - |
458
+ | 0.592 | 3700 | 0.6104 | 0.7480 | 0.8580 | - |
459
+ | 0.608 | 3800 | 0.5754 | 0.7215 | 0.8657 | - |
460
+ | 0.624 | 3900 | 0.5525 | 0.7900 | 0.8630 | - |
461
+ | 0.64 | 4000 | 0.7802 | 0.7443 | 0.8612 | - |
462
+ | 0.656 | 4100 | 0.9796 | 0.7756 | 0.8748 | - |
463
+ | 0.672 | 4200 | 0.9355 | 0.6917 | 0.8796 | - |
464
+ | 0.688 | 4300 | 0.7081 | 0.6442 | 0.8832 | - |
465
+ | 0.704 | 4400 | 0.6868 | 0.6395 | 0.8891 | - |
466
+ | 0.72 | 4500 | 0.5964 | 0.5983 | 0.8820 | - |
467
+ | 0.736 | 4600 | 0.6618 | 0.5754 | 0.8861 | - |
468
+ | 0.752 | 4700 | 0.6957 | 0.6177 | 0.8803 | - |
469
+ | 0.768 | 4800 | 0.6375 | 0.5577 | 0.8881 | - |
470
+ | 0.784 | 4900 | 0.5481 | 0.5496 | 0.8835 | - |
471
+ | 0.8 | 5000 | 0.6626 | 0.5728 | 0.8949 | - |
472
+ | 0.816 | 5100 | 0.5192 | 0.5329 | 0.8935 | - |
473
+ | 0.832 | 5200 | 0.5856 | 0.5188 | 0.8935 | - |
474
+ | 0.848 | 5300 | 0.5142 | 0.5252 | 0.8920 | - |
475
+ | 0.864 | 5400 | 0.6404 | 0.5641 | 0.8885 | - |
476
+ | 0.88 | 5500 | 0.5466 | 0.5209 | 0.8929 | - |
477
+ | 0.896 | 5600 | 0.575 | 0.5170 | 0.8961 | - |
478
+ | 0.912 | 5700 | 0.626 | 0.5095 | 0.9001 | - |
479
+ | 0.928 | 5800 | 0.5631 | 0.4817 | 0.8984 | - |
480
+ | 0.944 | 5900 | 0.7301 | 0.4996 | 0.8984 | - |
481
+ | 0.96 | 6000 | 0.7712 | 0.5160 | 0.9014 | - |
482
+ | 0.976 | 6100 | 0.6203 | 0.5000 | 0.9007 | - |
483
+ | 0.992 | 6200 | 0.0005 | 0.4996 | 0.9004 | - |
484
+ | 1.0 | 6250 | - | - | - | 0.9150 |
485
+
486
+
487
+ ### Environmental Impact
488
+ Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
489
+ - **Energy Consumed**: 0.306 kWh
490
+ - **Carbon Emitted**: 0.119 kg of CO2
491
+ - **Hours Used**: 1.661 hours
492
+
493
+ ### Training Hardware
494
+ - **On Cloud**: No
495
+ - **GPU Model**: 1 x NVIDIA GeForce RTX 3090
496
+ - **CPU Model**: 13th Gen Intel(R) Core(TM) i7-13700K
497
+ - **RAM Size**: 31.78 GB
498
+
499
+ ### Framework Versions
500
+ - Python: 3.11.6
501
+ - Sentence Transformers: 3.0.0.dev0
502
+ - Transformers: 4.41.1
503
+ - PyTorch: 2.3.0+cu121
504
+ - Accelerate: 0.30.1
505
+ - Datasets: 2.19.1
506
+ - Tokenizers: 0.19.1
507
+
508
+ ## Citation
509
+
510
+ ### BibTeX
511
+
512
+ #### Sentence Transformers
513
+ ```bibtex
514
+ @inproceedings{reimers-2019-sentence-bert,
515
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
516
+ author = "Reimers, Nils and Gurevych, Iryna",
517
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
518
+ month = "11",
519
+ year = "2019",
520
+ publisher = "Association for Computational Linguistics",
521
+ url = "https://arxiv.org/abs/1908.10084",
522
+ }
523
+ ```
524
+
525
+ #### MultipleNegativesRankingLoss
526
+ ```bibtex
527
+ @misc{henderson2017efficient,
528
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
529
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
530
+ year={2017},
531
+ eprint={1705.00652},
532
+ archivePrefix={arXiv},
533
+ primaryClass={cs.CL}
534
+ }
535
+ ```
536
+
537
+ <!--
538
+ ## Glossary
539
+
540
+ *Clearly define terms in order to be accessible across audiences.*
541
+ -->
542
+
543
+ <!--
544
+ ## Model Card Authors
545
+
546
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
547
+ -->
548
+
549
+ <!--
550
+ ## Model Card Contact
551
+
552
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
553
  -->