martimfasantos commited on
Commit
12d6811
1 Parent(s): 941617c

Model save

Browse files
README.md ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: martimfasantos/tinyllama-1.1b-sum-sft-full_old
4
+ tags:
5
+ - trl
6
+ - dpo
7
+ - generated_from_trainer
8
+ model-index:
9
+ - name: tinyllama-1.1b-sum-dpo-full_LR2e-7_3epochs_old
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ # tinyllama-1.1b-sum-dpo-full_LR2e-7_3epochs_old
17
+
18
+ This model is a fine-tuned version of [martimfasantos/tinyllama-1.1b-sum-sft-full_old](https://huggingface.co/martimfasantos/tinyllama-1.1b-sum-sft-full_old) on an unknown dataset.
19
+ It achieves the following results on the evaluation set:
20
+ - Loss: 0.6303
21
+ - Rewards/chosen: -1.4484
22
+ - Rewards/rejected: -1.8080
23
+ - Rewards/accuracies: 0.6436
24
+ - Rewards/margins: 0.3596
25
+ - Logps/rejected: -243.9776
26
+ - Logps/chosen: -203.5508
27
+ - Logits/rejected: -1.7024
28
+ - Logits/chosen: -1.7262
29
+
30
+ ## Model description
31
+
32
+ More information needed
33
+
34
+ ## Intended uses & limitations
35
+
36
+ More information needed
37
+
38
+ ## Training and evaluation data
39
+
40
+ More information needed
41
+
42
+ ## Training procedure
43
+
44
+ ### Training hyperparameters
45
+
46
+ The following hyperparameters were used during training:
47
+ - learning_rate: 2e-07
48
+ - train_batch_size: 8
49
+ - eval_batch_size: 8
50
+ - seed: 42
51
+ - distributed_type: multi-GPU
52
+ - gradient_accumulation_steps: 2
53
+ - total_train_batch_size: 16
54
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
+ - lr_scheduler_type: cosine
56
+ - lr_scheduler_warmup_ratio: 0.1
57
+ - num_epochs: 3
58
+
59
+ ### Training results
60
+
61
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
62
+ |:-------------:|:------:|:-----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
63
+ | 0.6931 | 0.0689 | 400 | 0.6932 | 0.0002 | 0.0003 | 0.4654 | -0.0001 | -63.1542 | -58.6924 | -3.1574 | -3.1630 |
64
+ | 0.692 | 0.1378 | 800 | 0.6928 | 0.0015 | 0.0008 | 0.5525 | 0.0007 | -63.0955 | -58.5586 | -3.1518 | -3.1574 |
65
+ | 0.6902 | 0.2068 | 1200 | 0.6914 | 0.0009 | -0.0027 | 0.5876 | 0.0037 | -63.4527 | -58.6187 | -3.1281 | -3.1338 |
66
+ | 0.6835 | 0.2757 | 1600 | 0.6888 | -0.0225 | -0.0320 | 0.5864 | 0.0096 | -66.3833 | -60.9598 | -3.0838 | -3.0895 |
67
+ | 0.6778 | 0.3446 | 2000 | 0.6845 | -0.0724 | -0.0918 | 0.5976 | 0.0194 | -72.3574 | -65.9486 | -3.0213 | -3.0270 |
68
+ | 0.6688 | 0.4135 | 2400 | 0.6792 | -0.1403 | -0.1725 | 0.6032 | 0.0323 | -80.4345 | -72.7375 | -2.9370 | -2.9428 |
69
+ | 0.6675 | 0.4824 | 2800 | 0.6732 | -0.2283 | -0.2756 | 0.6057 | 0.0472 | -90.7353 | -81.5436 | -2.8576 | -2.8635 |
70
+ | 0.6437 | 0.5513 | 3200 | 0.6646 | -0.3557 | -0.4265 | 0.6120 | 0.0708 | -105.8322 | -94.2796 | -2.7546 | -2.7607 |
71
+ | 0.6516 | 0.6203 | 3600 | 0.6602 | -0.4125 | -0.4982 | 0.6178 | 0.0856 | -112.9954 | -99.9643 | -2.6547 | -2.6612 |
72
+ | 0.6264 | 0.6892 | 4000 | 0.6514 | -0.5858 | -0.7050 | 0.6315 | 0.1192 | -133.6785 | -117.2944 | -2.5252 | -2.5324 |
73
+ | 0.6109 | 0.7581 | 4400 | 0.6474 | -0.6217 | -0.7587 | 0.6313 | 0.1370 | -139.0484 | -120.8850 | -2.4041 | -2.4124 |
74
+ | 0.6153 | 0.8270 | 4800 | 0.6432 | -0.7112 | -0.8720 | 0.6266 | 0.1608 | -150.3814 | -129.8305 | -2.3206 | -2.3302 |
75
+ | 0.6107 | 0.8959 | 5200 | 0.6407 | -0.7470 | -0.9249 | 0.6350 | 0.1779 | -155.6741 | -133.4166 | -2.2363 | -2.2476 |
76
+ | 0.6061 | 0.9649 | 5600 | 0.6392 | -0.7851 | -0.9723 | 0.6315 | 0.1871 | -160.4070 | -137.2255 | -2.1733 | -2.1859 |
77
+ | 0.5701 | 1.0338 | 6000 | 0.6356 | -1.0035 | -1.2450 | 0.6292 | 0.2415 | -187.6758 | -159.0581 | -2.0122 | -2.0292 |
78
+ | 0.5557 | 1.1027 | 6400 | 0.6358 | -1.0296 | -1.2785 | 0.6322 | 0.2489 | -191.0262 | -161.6682 | -1.9777 | -1.9953 |
79
+ | 0.5292 | 1.1716 | 6800 | 0.6333 | -1.0878 | -1.3492 | 0.6313 | 0.2614 | -198.1001 | -167.4900 | -1.8969 | -1.9159 |
80
+ | 0.5473 | 1.2405 | 7200 | 0.6354 | -1.0479 | -1.2958 | 0.6262 | 0.2479 | -192.7597 | -163.5001 | -1.9044 | -1.9226 |
81
+ | 0.6231 | 1.3094 | 7600 | 0.6346 | -1.2184 | -1.4979 | 0.6289 | 0.2795 | -212.9705 | -180.5535 | -1.8355 | -1.8558 |
82
+ | 0.5403 | 1.3784 | 8000 | 0.6339 | -1.1437 | -1.4111 | 0.6264 | 0.2673 | -204.2867 | -173.0842 | -1.8647 | -1.8848 |
83
+ | 0.5444 | 1.4473 | 8400 | 0.6339 | -1.0726 | -1.3310 | 0.6287 | 0.2584 | -196.2827 | -165.9765 | -1.8568 | -1.8768 |
84
+ | 0.5766 | 1.5162 | 8800 | 0.6329 | -1.0364 | -1.2879 | 0.6336 | 0.2516 | -191.9749 | -162.3483 | -1.8819 | -1.9009 |
85
+ | 0.525 | 1.5851 | 9200 | 0.6320 | -1.1870 | -1.4611 | 0.6366 | 0.2740 | -209.2869 | -177.4161 | -1.8122 | -1.8325 |
86
+ | 0.5174 | 1.6540 | 9600 | 0.6310 | -1.2662 | -1.5606 | 0.6375 | 0.2944 | -219.2438 | -185.3348 | -1.7597 | -1.7810 |
87
+ | 0.5312 | 1.7229 | 10000 | 0.6313 | -1.2979 | -1.6013 | 0.6359 | 0.3033 | -223.3081 | -188.5056 | -1.7629 | -1.7848 |
88
+ | 0.4923 | 1.7919 | 10400 | 0.6312 | -1.1596 | -1.4412 | 0.6334 | 0.2815 | -207.2955 | -174.6746 | -1.7754 | -1.7966 |
89
+ | 0.5386 | 1.8608 | 10800 | 0.6304 | -1.2706 | -1.5735 | 0.6373 | 0.3029 | -220.5279 | -185.7685 | -1.7500 | -1.7722 |
90
+ | 0.5178 | 1.9297 | 11200 | 0.6295 | -1.2859 | -1.6008 | 0.6443 | 0.3149 | -223.2599 | -187.3036 | -1.7272 | -1.7501 |
91
+ | 0.5556 | 1.9986 | 11600 | 0.6295 | -1.2652 | -1.5714 | 0.6362 | 0.3062 | -220.3214 | -185.2294 | -1.7356 | -1.7580 |
92
+ | 0.4901 | 2.0675 | 12000 | 0.6303 | -1.4749 | -1.8246 | 0.6447 | 0.3497 | -245.6420 | -206.2009 | -1.6688 | -1.6928 |
93
+ | 0.4713 | 2.1365 | 12400 | 0.6303 | -1.6230 | -2.0017 | 0.6471 | 0.3786 | -263.3478 | -221.0147 | -1.6397 | -1.6644 |
94
+ | 0.5188 | 2.2054 | 12800 | 0.6305 | -1.4593 | -1.8052 | 0.6408 | 0.3458 | -243.6979 | -204.6454 | -1.6776 | -1.7011 |
95
+ | 0.5395 | 2.2743 | 13200 | 0.6315 | -1.5373 | -1.9051 | 0.6429 | 0.3678 | -253.6892 | -212.4377 | -1.6591 | -1.6834 |
96
+ | 0.5059 | 2.3432 | 13600 | 0.6318 | -1.4799 | -1.8381 | 0.6431 | 0.3582 | -246.9884 | -206.6992 | -1.6812 | -1.7051 |
97
+ | 0.4543 | 2.4121 | 14000 | 0.6318 | -1.3717 | -1.7109 | 0.6459 | 0.3392 | -234.2693 | -195.8793 | -1.7134 | -1.7366 |
98
+ | 0.5121 | 2.4810 | 14400 | 0.6308 | -1.4206 | -1.7736 | 0.6447 | 0.3530 | -240.5389 | -200.7700 | -1.7016 | -1.7252 |
99
+ | 0.4847 | 2.5500 | 14800 | 0.6304 | -1.4817 | -1.8498 | 0.6443 | 0.3681 | -248.1589 | -206.8796 | -1.6912 | -1.7153 |
100
+ | 0.4701 | 2.6189 | 15200 | 0.6306 | -1.4145 | -1.7659 | 0.6445 | 0.3514 | -239.7732 | -200.1665 | -1.7090 | -1.7324 |
101
+ | 0.5011 | 2.6878 | 15600 | 0.6304 | -1.4080 | -1.7575 | 0.6434 | 0.3495 | -238.9349 | -199.5119 | -1.7135 | -1.7369 |
102
+ | 0.4936 | 2.7567 | 16000 | 0.6304 | -1.4490 | -1.8088 | 0.6436 | 0.3598 | -244.0595 | -203.6143 | -1.7010 | -1.7248 |
103
+ | 0.4952 | 2.8256 | 16400 | 0.6312 | -1.4483 | -1.8060 | 0.6438 | 0.3577 | -243.7794 | -203.5389 | -1.7043 | -1.7279 |
104
+ | 0.5024 | 2.8946 | 16800 | 0.6304 | -1.4492 | -1.8094 | 0.6429 | 0.3602 | -244.1201 | -203.6308 | -1.7037 | -1.7274 |
105
+ | 0.5054 | 2.9635 | 17200 | 0.6303 | -1.4484 | -1.8080 | 0.6436 | 0.3596 | -243.9776 | -203.5508 | -1.7024 | -1.7262 |
106
+
107
+
108
+ ### Framework versions
109
+
110
+ - Transformers 4.41.2
111
+ - Pytorch 2.1.2
112
+ - Datasets 2.19.2
113
+ - Tokenizers 0.19.1
all_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.5724969553024556,
5
+ "train_runtime": 86264.9537,
6
+ "train_samples": 92858,
7
+ "train_samples_per_second": 3.229,
8
+ "train_steps_per_second": 0.202
9
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 1,
3
+ "eos_token_id": 2,
4
+ "max_length": 2048,
5
+ "pad_token_id": 0,
6
+ "transformers_version": "4.41.2"
7
+ }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3a76876a740fd471f1036d0e7f556e3bb7a0113ac066a877e25741ff27d86752
3
  size 4400216536
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0cf069d97b932ed27f02f1b30381e022374783ad50c007eb6cece200aa0d186f
3
  size 4400216536
runs/Jun11_00-56-54_poseidon/events.out.tfevents.1718067780.poseidon.4172683.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:57b8bad68ef7bbc97834eae8e4ed76ee7a43e9fe04578ea806bb0de7e3ff4cac
3
- size 1221895
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f61eb4857897594467d8b0c3d3529ee5f9bbe8e3ea175d27abcfe2c62b418de1
3
+ size 1236955
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.5724969553024556,
5
+ "train_runtime": 86264.9537,
6
+ "train_samples": 92858,
7
+ "train_samples_per_second": 3.229,
8
+ "train_steps_per_second": 0.202
9
+ }
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff