manu commited on
Commit
b1980db
1 Parent(s): 264b250

Upload folder using huggingface_hub

Browse files
final/README.md CHANGED
@@ -7,7 +7,7 @@ tags:
7
  - sentence-similarity
8
  - feature-extraction
9
  - dataset_size:100K<n<1M
10
- - loss:MultipleNegativesRankingLoss
11
  base_model: FacebookAI/xlm-roberta-large
12
  metrics:
13
  - cosine_accuracy
@@ -18,29 +18,29 @@ metrics:
18
  widget:
19
  - source_sentence: The boy scowls
20
  sentences:
21
- - People are around a fire
22
- - Boy playing baseball.
23
- - The girls are at school.
24
- - source_sentence: an eagle flies
25
- sentences:
26
- - A man floats up a ladder.
27
- - He is playing a song.
28
- - The t-shirt is white.
29
  - source_sentence: A woman sings.
30
  sentences:
31
- - The woman is outdoors.
32
- - the animal is running
33
- - A man is playing indoors.
34
  - source_sentence: A bird flying.
35
  sentences:
36
- - No one is on a canoe.
37
- - A man is on his feet.
38
- - Two men listen to music.
 
 
 
 
 
39
  - source_sentence: There's a dock
40
  sentences:
41
- - The man is performing.
42
- - Five people on a path
43
- - The elephant sits on a dog
44
  pipeline_tag: sentence-similarity
45
  model-index:
46
  - name: SentenceTransformer based on FacebookAI/xlm-roberta-large
@@ -53,19 +53,19 @@ model-index:
53
  type: all-nli-dev
54
  metrics:
55
  - type: cosine_accuracy
56
- value: 0.452
57
  name: Cosine Accuracy
58
  - type: dot_accuracy
59
- value: 0.34
60
  name: Dot Accuracy
61
  - type: manhattan_accuracy
62
- value: 0.456
63
  name: Manhattan Accuracy
64
  - type: euclidean_accuracy
65
- value: 0.452
66
  name: Euclidean Accuracy
67
  - type: max_accuracy
68
- value: 0.456
69
  name: Max Accuracy
70
  - task:
71
  type: triplet
@@ -75,19 +75,19 @@ model-index:
75
  type: all-nli-test
76
  metrics:
77
  - type: cosine_accuracy
78
- value: 0.481
79
  name: Cosine Accuracy
80
  - type: dot_accuracy
81
- value: 0.364
82
  name: Dot Accuracy
83
  - type: manhattan_accuracy
84
- value: 0.48
85
  name: Manhattan Accuracy
86
  - type: euclidean_accuracy
87
- value: 0.481
88
  name: Euclidean Accuracy
89
  - type: max_accuracy
90
- value: 0.481
91
  name: Max Accuracy
92
  ---
93
 
@@ -142,8 +142,8 @@ model = SentenceTransformer("sentence_transformers_model_id")
142
  # Run inference
143
  sentences = [
144
  "There's a dock",
145
- 'The man is performing.',
146
- 'Five people on a path',
147
  ]
148
  embeddings = model.encode(sentences)
149
  print(embeddings.shape)
@@ -189,11 +189,11 @@ You can finetune this model on your own dataset.
189
 
190
  | Metric | Value |
191
  |:-------------------|:----------|
192
- | cosine_accuracy | 0.452 |
193
- | dot_accuracy | 0.34 |
194
- | manhattan_accuracy | 0.456 |
195
- | euclidean_accuracy | 0.452 |
196
- | **max_accuracy** | **0.456** |
197
 
198
  #### Triplet
199
  * Dataset: `all-nli-test`
@@ -201,11 +201,11 @@ You can finetune this model on your own dataset.
201
 
202
  | Metric | Value |
203
  |:-------------------|:----------|
204
- | cosine_accuracy | 0.481 |
205
- | dot_accuracy | 0.364 |
206
- | manhattan_accuracy | 0.48 |
207
- | euclidean_accuracy | 0.481 |
208
- | **max_accuracy** | **0.481** |
209
 
210
  <!--
211
  ## Bias, Risks and Limitations
@@ -239,7 +239,7 @@ You can finetune this model on your own dataset.
239
  | <code>A person on a horse jumps over a broken down airplane.</code> | <code>A person is outdoors, on a horse.</code> | <code>A person is at a diner, ordering an omelette.</code> |
240
  | <code>Children smiling and waving at camera</code> | <code>There are children present</code> | <code>The kids are frowning</code> |
241
  | <code>A boy is jumping on skateboard in the middle of a red bridge.</code> | <code>The boy does a skateboarding trick.</code> | <code>The boy skates down the sidewalk.</code> |
242
- * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
243
  ```json
244
  {
245
  "scale": 20.0,
@@ -265,7 +265,7 @@ You can finetune this model on your own dataset.
265
  | <code>Two women are embracing while holding to go packages.</code> | <code>Two woman are holding packages.</code> | <code>The men are fighting outside a deli.</code> |
266
  | <code>Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink.</code> | <code>Two kids in numbered jerseys wash their hands.</code> | <code>Two kids in jackets walk to school.</code> |
267
  | <code>A man selling donuts to a customer during a world exhibition event held in the city of Angeles</code> | <code>A man selling donuts to a customer.</code> | <code>A woman drinks her coffee in a small cafe.</code> |
268
- * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
269
  ```json
270
  {
271
  "scale": 20.0,
@@ -281,7 +281,7 @@ You can finetune this model on your own dataset.
281
  - `per_device_eval_batch_size`: 16
282
  - `num_train_epochs`: 1
283
  - `warmup_ratio`: 0.1
284
- - `fp16`: True
285
  - `batch_sampler`: no_duplicates
286
 
287
  #### All Hyperparameters
@@ -324,8 +324,8 @@ You can finetune this model on your own dataset.
324
  - `data_seed`: None
325
  - `jit_mode_eval`: False
326
  - `use_ipex`: False
327
- - `bf16`: False
328
- - `fp16`: True
329
  - `fp16_opt_level`: O1
330
  - `half_precision_backend`: auto
331
  - `bf16_full_eval`: False
@@ -401,70 +401,70 @@ You can finetune this model on your own dataset.
401
  ### Training Logs
402
  | Epoch | Step | Training Loss | loss | all-nli-dev_max_accuracy | all-nli-test_max_accuracy |
403
  |:-----:|:----:|:-------------:|:------:|:------------------------:|:-------------------------:|
404
- | 0 | 0 | - | - | 0.616 | - |
405
- | 0.016 | 100 | 3.2768 | 1.8053 | 0.833 | - |
406
- | 0.032 | 200 | 1.1697 | 1.2878 | 0.861 | - |
407
- | 0.048 | 300 | 1.372 | 1.2466 | 0.861 | - |
408
- | 0.064 | 400 | 1.0476 | 1.2291 | 0.863 | - |
409
- | 0.08 | 500 | 0.8588 | 1.5259 | 0.838 | - |
410
- | 0.096 | 600 | 2.9781 | 3.4309 | 0.463 | - |
411
- | 0.112 | 700 | 3.4982 | 3.4309 | 0.457 | - |
412
- | 0.128 | 800 | 3.467 | 3.4309 | 0.479 | - |
413
- | 0.144 | 900 | 3.4665 | 3.4309 | 0.452 | - |
414
- | 0.16 | 1000 | 3.4664 | 3.4309 | 0.477 | - |
415
- | 0.176 | 1100 | 3.4663 | 3.4309 | 0.458 | - |
416
- | 0.192 | 1200 | 3.4661 | 3.4309 | 0.462 | - |
417
- | 0.208 | 1300 | 3.4658 | 3.4309 | 0.45 | - |
418
- | 0.224 | 1400 | 3.4661 | 3.4309 | 0.481 | - |
419
- | 0.24 | 1500 | 3.4877 | 3.4309 | 0.464 | - |
420
- | 0.256 | 1600 | 3.4675 | 3.4309 | 0.462 | - |
421
- | 0.272 | 1700 | 3.4665 | 3.4309 | 0.488 | - |
422
- | 0.288 | 1800 | 3.4667 | 3.4309 | 0.492 | - |
423
- | 0.304 | 1900 | 3.4664 | 3.4309 | 0.455 | - |
424
- | 0.32 | 2000 | 3.4661 | 3.4309 | 0.453 | - |
425
- | 0.336 | 2100 | 3.4666 | 3.4309 | 0.477 | - |
426
- | 0.352 | 2200 | 3.4683 | 3.4309 | 0.48 | - |
427
- | 0.368 | 2300 | 3.4663 | 3.4309 | 0.469 | - |
428
- | 0.384 | 2400 | 3.4667 | 3.4309 | 0.448 | - |
429
- | 0.4 | 2500 | 3.4669 | 3.4309 | 0.499 | - |
430
- | 0.416 | 2600 | 3.4661 | 3.4309 | 0.453 | - |
431
- | 0.432 | 2700 | 3.4656 | 3.4309 | 0.467 | - |
432
- | 0.448 | 2800 | 3.4662 | 3.4309 | 0.507 | - |
433
- | 0.464 | 2900 | 3.4902 | 3.4309 | 0.473 | - |
434
- | 0.48 | 3000 | 3.4663 | 3.4309 | 0.469 | - |
435
- | 0.496 | 3100 | 3.554 | 3.4309 | 0.46 | - |
436
- | 0.512 | 3200 | 3.4664 | 3.4309 | 0.455 | - |
437
- | 0.528 | 3300 | 3.4668 | 3.4309 | 0.46 | - |
438
- | 0.544 | 3400 | 3.4661 | 3.4309 | 0.492 | - |
439
- | 0.56 | 3500 | 3.4667 | 3.4309 | 0.432 | - |
440
- | 0.576 | 3600 | 3.4668 | 3.4309 | 0.486 | - |
441
- | 0.592 | 3700 | 3.4666 | 3.4309 | 0.469 | - |
442
- | 0.608 | 3800 | 3.4669 | 3.4309 | 0.473 | - |
443
- | 0.624 | 3900 | 3.4658 | 3.4309 | 0.487 | - |
444
- | 0.64 | 4000 | 3.4663 | 3.4309 | 0.448 | - |
445
- | 0.656 | 4100 | 3.4663 | 3.4309 | 0.465 | - |
446
- | 0.672 | 4200 | 3.4664 | 3.4309 | 0.484 | - |
447
- | 0.688 | 4300 | 3.4663 | 3.4309 | 0.469 | - |
448
- | 0.704 | 4400 | 3.4661 | 3.4309 | 0.478 | - |
449
- | 0.72 | 4500 | 3.4669 | 3.4309 | 0.467 | - |
450
- | 0.736 | 4600 | 3.4664 | 3.4309 | 0.455 | - |
451
- | 0.752 | 4700 | 3.4664 | 3.4309 | 0.481 | - |
452
- | 0.768 | 4800 | 3.4659 | 3.4309 | 0.466 | - |
453
- | 0.784 | 4900 | 3.466 | 3.4309 | 0.451 | - |
454
- | 0.8 | 5000 | 3.466 | 3.4309 | 0.473 | - |
455
- | 0.816 | 5100 | 3.4664 | 3.4309 | 0.44 | - |
456
- | 0.832 | 5200 | 3.4658 | 3.4309 | 0.497 | - |
457
- | 0.848 | 5300 | 3.4664 | 3.4309 | 0.474 | - |
458
- | 0.864 | 5400 | 3.4658 | 3.4309 | 0.449 | - |
459
- | 0.88 | 5500 | 3.4662 | 3.4309 | 0.466 | - |
460
- | 0.896 | 5600 | 3.4663 | 3.4309 | 0.476 | - |
461
- | 0.912 | 5700 | 3.4667 | 3.4309 | 0.455 | - |
462
- | 0.928 | 5800 | 3.4669 | 3.4309 | 0.463 | - |
463
- | 0.944 | 5900 | 3.4657 | 3.4309 | 0.467 | - |
464
- | 0.96 | 6000 | 3.4671 | 3.4309 | 0.456 | - |
465
- | 0.976 | 6100 | 2.9471 | 3.4309 | 0.484 | - |
466
- | 0.992 | 6200 | 0.6929 | 3.4309 | 0.456 | - |
467
- | 1.0 | 6250 | - | - | - | 0.481 |
468
 
469
 
470
  ### Framework Versions
@@ -493,15 +493,15 @@ You can finetune this model on your own dataset.
493
  }
494
  ```
495
 
496
- #### MultipleNegativesRankingLoss
497
  ```bibtex
498
- @misc{henderson2017efficient,
499
- title={Efficient Natural Language Response Suggestion for Smart Reply},
500
- author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
501
- year={2017},
502
- eprint={1705.00652},
503
  archivePrefix={arXiv},
504
- primaryClass={cs.CL}
505
  }
506
  ```
507
 
 
7
  - sentence-similarity
8
  - feature-extraction
9
  - dataset_size:100K<n<1M
10
+ - loss:CachedMultipleNegativesRankingLoss
11
  base_model: FacebookAI/xlm-roberta-large
12
  metrics:
13
  - cosine_accuracy
 
18
  widget:
19
  - source_sentence: The boy scowls
20
  sentences:
21
+ - The boy is outside.
22
+ - The man is in a city.
23
+ - A woman at home.
 
 
 
 
 
24
  - source_sentence: A woman sings.
25
  sentences:
26
+ - The woman is singing.
27
+ - a man is wearing blue
28
+ - The boys are eating.
29
  - source_sentence: A bird flying.
30
  sentences:
31
+ - A butterfly flys freely.
32
+ - She checks her phone.
33
+ - A man is sleeping.
34
+ - source_sentence: an eagle flies
35
+ sentences:
36
+ - A butterfly flys freely.
37
+ - The men are together.
38
+ - A man is sleeping.
39
  - source_sentence: There's a dock
40
  sentences:
41
+ - There are people outdoors
42
+ - Boy playing baseball.
43
+ - A man is sleeping.
44
  pipeline_tag: sentence-similarity
45
  model-index:
46
  - name: SentenceTransformer based on FacebookAI/xlm-roberta-large
 
53
  type: all-nli-dev
54
  metrics:
55
  - type: cosine_accuracy
56
+ value: 0.941
57
  name: Cosine Accuracy
58
  - type: dot_accuracy
59
+ value: 0.062
60
  name: Dot Accuracy
61
  - type: manhattan_accuracy
62
+ value: 0.937
63
  name: Manhattan Accuracy
64
  - type: euclidean_accuracy
65
+ value: 0.938
66
  name: Euclidean Accuracy
67
  - type: max_accuracy
68
+ value: 0.941
69
  name: Max Accuracy
70
  - task:
71
  type: triplet
 
75
  type: all-nli-test
76
  metrics:
77
  - type: cosine_accuracy
78
+ value: 0.943
79
  name: Cosine Accuracy
80
  - type: dot_accuracy
81
+ value: 0.057
82
  name: Dot Accuracy
83
  - type: manhattan_accuracy
84
+ value: 0.947
85
  name: Manhattan Accuracy
86
  - type: euclidean_accuracy
87
+ value: 0.947
88
  name: Euclidean Accuracy
89
  - type: max_accuracy
90
+ value: 0.947
91
  name: Max Accuracy
92
  ---
93
 
 
142
  # Run inference
143
  sentences = [
144
  "There's a dock",
145
+ 'There are people outdoors',
146
+ 'Boy playing baseball.',
147
  ]
148
  embeddings = model.encode(sentences)
149
  print(embeddings.shape)
 
189
 
190
  | Metric | Value |
191
  |:-------------------|:----------|
192
+ | cosine_accuracy | 0.941 |
193
+ | dot_accuracy | 0.062 |
194
+ | manhattan_accuracy | 0.937 |
195
+ | euclidean_accuracy | 0.938 |
196
+ | **max_accuracy** | **0.941** |
197
 
198
  #### Triplet
199
  * Dataset: `all-nli-test`
 
201
 
202
  | Metric | Value |
203
  |:-------------------|:----------|
204
+ | cosine_accuracy | 0.943 |
205
+ | dot_accuracy | 0.057 |
206
+ | manhattan_accuracy | 0.947 |
207
+ | euclidean_accuracy | 0.947 |
208
+ | **max_accuracy** | **0.947** |
209
 
210
  <!--
211
  ## Bias, Risks and Limitations
 
239
  | <code>A person on a horse jumps over a broken down airplane.</code> | <code>A person is outdoors, on a horse.</code> | <code>A person is at a diner, ordering an omelette.</code> |
240
  | <code>Children smiling and waving at camera</code> | <code>There are children present</code> | <code>The kids are frowning</code> |
241
  | <code>A boy is jumping on skateboard in the middle of a red bridge.</code> | <code>The boy does a skateboarding trick.</code> | <code>The boy skates down the sidewalk.</code> |
242
+ * Loss: [<code>CachedMultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedmultiplenegativesrankingloss) with these parameters:
243
  ```json
244
  {
245
  "scale": 20.0,
 
265
  | <code>Two women are embracing while holding to go packages.</code> | <code>Two woman are holding packages.</code> | <code>The men are fighting outside a deli.</code> |
266
  | <code>Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink.</code> | <code>Two kids in numbered jerseys wash their hands.</code> | <code>Two kids in jackets walk to school.</code> |
267
  | <code>A man selling donuts to a customer during a world exhibition event held in the city of Angeles</code> | <code>A man selling donuts to a customer.</code> | <code>A woman drinks her coffee in a small cafe.</code> |
268
+ * Loss: [<code>CachedMultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedmultiplenegativesrankingloss) with these parameters:
269
  ```json
270
  {
271
  "scale": 20.0,
 
281
  - `per_device_eval_batch_size`: 16
282
  - `num_train_epochs`: 1
283
  - `warmup_ratio`: 0.1
284
+ - `bf16`: True
285
  - `batch_sampler`: no_duplicates
286
 
287
  #### All Hyperparameters
 
324
  - `data_seed`: None
325
  - `jit_mode_eval`: False
326
  - `use_ipex`: False
327
+ - `bf16`: True
328
+ - `fp16`: False
329
  - `fp16_opt_level`: O1
330
  - `half_precision_backend`: auto
331
  - `bf16_full_eval`: False
 
401
  ### Training Logs
402
  | Epoch | Step | Training Loss | loss | all-nli-dev_max_accuracy | all-nli-test_max_accuracy |
403
  |:-----:|:----:|:-------------:|:------:|:------------------------:|:-------------------------:|
404
+ | 0 | 0 | - | - | 0.613 | - |
405
+ | 0.016 | 100 | 3.4639 | 3.4199 | 0.621 | - |
406
+ | 0.032 | 200 | 3.4496 | 3.1967 | 0.841 | - |
407
+ | 0.048 | 300 | 2.2928 | 1.0476 | 0.864 | - |
408
+ | 0.064 | 400 | 1.2217 | 0.9993 | 0.871 | - |
409
+ | 0.08 | 500 | 1.1075 | 1.2674 | 0.85 | - |
410
+ | 0.096 | 600 | 1.2113 | 1.2565 | 0.866 | - |
411
+ | 0.112 | 700 | 1.0326 | 1.3313 | 0.855 | - |
412
+ | 0.128 | 800 | 1.2326 | 1.3698 | 0.851 | - |
413
+ | 0.144 | 900 | 1.2897 | 1.2690 | 0.855 | - |
414
+ | 0.16 | 1000 | 1.275 | 1.1231 | 0.863 | - |
415
+ | 0.176 | 1100 | 1.0823 | 1.2453 | 0.853 | - |
416
+ | 0.192 | 1200 | 1.1933 | 1.1119 | 0.868 | - |
417
+ | 0.208 | 1300 | 1.0102 | 0.9491 | 0.86 | - |
418
+ | 0.224 | 1400 | 0.8738 | 1.0682 | 0.87 | - |
419
+ | 0.24 | 1500 | 0.9482 | 0.8546 | 0.89 | - |
420
+ | 0.256 | 1600 | 0.6985 | 0.9136 | 0.88 | - |
421
+ | 0.272 | 1700 | 0.9908 | 0.9539 | 0.873 | - |
422
+ | 0.288 | 1800 | 1.0166 | 0.9277 | 0.878 | - |
423
+ | 0.304 | 1900 | 0.9441 | 0.9000 | 0.886 | - |
424
+ | 0.32 | 2000 | 0.8911 | 0.8364 | 0.891 | - |
425
+ | 0.336 | 2100 | 0.6746 | 0.8585 | 0.883 | - |
426
+ | 0.352 | 2200 | 0.7379 | 0.8332 | 0.888 | - |
427
+ | 0.368 | 2300 | 0.896 | 0.7617 | 0.89 | - |
428
+ | 0.384 | 2400 | 0.7901 | 0.7351 | 0.887 | - |
429
+ | 0.4 | 2500 | 0.811 | 0.7855 | 0.89 | - |
430
+ | 0.416 | 2600 | 0.6723 | 0.6756 | 0.899 | - |
431
+ | 0.432 | 2700 | 0.8839 | 0.7839 | 0.894 | - |
432
+ | 0.448 | 2800 | 0.9027 | 0.7319 | 0.903 | - |
433
+ | 0.464 | 2900 | 0.9276 | 0.7038 | 0.893 | - |
434
+ | 0.48 | 3000 | 0.7692 | 0.6653 | 0.903 | - |
435
+ | 0.496 | 3100 | 0.8044 | 0.6466 | 0.901 | - |
436
+ | 0.512 | 3200 | 0.6433 | 0.6145 | 0.906 | - |
437
+ | 0.528 | 3300 | 0.6642 | 0.5774 | 0.912 | - |
438
+ | 0.544 | 3400 | 0.5904 | 0.6054 | 0.907 | - |
439
+ | 0.56 | 3500 | 0.6378 | 0.5841 | 0.91 | - |
440
+ | 0.576 | 3600 | 0.5602 | 0.5444 | 0.921 | - |
441
+ | 0.592 | 3700 | 0.6436 | 0.5563 | 0.917 | - |
442
+ | 0.608 | 3800 | 0.588 | 0.5108 | 0.927 | - |
443
+ | 0.624 | 3900 | 0.5834 | 0.5059 | 0.925 | - |
444
+ | 0.64 | 4000 | 0.842 | 0.5217 | 0.929 | - |
445
+ | 0.656 | 4100 | 1.0995 | 0.5060 | 0.933 | - |
446
+ | 0.672 | 4200 | 0.9605 | 0.4842 | 0.928 | - |
447
+ | 0.688 | 4300 | 0.7811 | 0.4756 | 0.93 | - |
448
+ | 0.704 | 4400 | 0.7288 | 0.4650 | 0.938 | - |
449
+ | 0.72 | 4500 | 0.6636 | 0.4576 | 0.94 | - |
450
+ | 0.736 | 4600 | 0.7445 | 0.4552 | 0.934 | - |
451
+ | 0.752 | 4700 | 0.7687 | 0.4511 | 0.934 | - |
452
+ | 0.768 | 4800 | 0.7101 | 0.4446 | 0.936 | - |
453
+ | 0.784 | 4900 | 0.6586 | 0.4378 | 0.937 | - |
454
+ | 0.8 | 5000 | 0.789 | 0.4368 | 0.938 | - |
455
+ | 0.816 | 5100 | 0.6227 | 0.4344 | 0.941 | - |
456
+ | 0.832 | 5200 | 0.6994 | 0.4349 | 0.939 | - |
457
+ | 0.848 | 5300 | 0.687 | 0.4327 | 0.943 | - |
458
+ | 0.864 | 5400 | 0.76 | 0.4319 | 0.943 | - |
459
+ | 0.88 | 5500 | 0.6644 | 0.4323 | 0.941 | - |
460
+ | 0.896 | 5600 | 0.6535 | 0.4306 | 0.941 | - |
461
+ | 0.912 | 5700 | 0.7622 | 0.4289 | 0.941 | - |
462
+ | 0.928 | 5800 | 0.7053 | 0.4288 | 0.94 | - |
463
+ | 0.944 | 5900 | 0.8093 | 0.4289 | 0.94 | - |
464
+ | 0.96 | 6000 | 0.8658 | 0.4284 | 0.941 | - |
465
+ | 0.976 | 6100 | 0.7624 | 0.4283 | 0.941 | - |
466
+ | 0.992 | 6200 | 0.0003 | 0.4286 | 0.941 | - |
467
+ | 1.0 | 6250 | - | - | - | 0.947 |
468
 
469
 
470
  ### Framework Versions
 
493
  }
494
  ```
495
 
496
+ #### CachedMultipleNegativesRankingLoss
497
  ```bibtex
498
+ @misc{gao2021scaling,
499
+ title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
500
+ author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
501
+ year={2021},
502
+ eprint={2101.06983},
503
  archivePrefix={arXiv},
504
+ primaryClass={cs.LG}
505
  }
506
  ```
507
 
final/config.json CHANGED
@@ -20,7 +20,7 @@
20
  "output_past": true,
21
  "pad_token_id": 1,
22
  "position_embedding_type": "absolute",
23
- "torch_dtype": "float32",
24
  "transformers_version": "4.41.2",
25
  "type_vocab_size": 1,
26
  "use_cache": true,
 
20
  "output_past": true,
21
  "pad_token_id": 1,
22
  "position_embedding_type": "absolute",
23
+ "torch_dtype": "bfloat16",
24
  "transformers_version": "4.41.2",
25
  "type_vocab_size": 1,
26
  "use_cache": true,
final/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8c35c8055250a476dc32b87c601f3abe4bc9aa87098c4d6976e79dc6094a3af3
3
- size 2239607176
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e81195ec7ed25e3bc167b118e7676c00ae9f6d52630d8d25e4cfa5974ddf530
3
+ size 1119826072
runs/Jun03_21-23-40_ruche-gpu18.cluster/events.out.tfevents.1717442695.ruche-gpu18.cluster.1785.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6202c48153d42fde3cf0082ca6c6a70f02b954c582d4054b5ec57f7ab8f5969a
3
+ size 15963
runs/Jun03_21-55-04_ruche-gpu18.cluster/events.out.tfevents.1717444545.ruche-gpu18.cluster.20850.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d6d264d673c99450b336f2afe2e5b1eeabbe74cba8049adbc8aa526cd738e7de
3
+ size 56493