happii commited on
Commit
3a82156
1 Parent(s): b00fe91

Model save

Browse files
README.md CHANGED
@@ -1,4 +1,6 @@
1
  ---
 
 
2
  tags:
3
  - trl
4
  - dpo
@@ -13,17 +15,17 @@ should probably proofread and complete it, then remove this comment. -->
13
 
14
  # zephyr-7b-dpo-full
15
 
16
- This model was trained from scratch on an unknown dataset.
17
  It achieves the following results on the evaluation set:
18
- - Loss: 0.5082
19
- - Rewards/chosen: -1.1578
20
- - Rewards/rejected: -2.0459
21
  - Rewards/accuracies: 0.7639
22
- - Rewards/margins: 0.8881
23
- - Logps/rejected: -470.7423
24
- - Logps/chosen: -407.5118
25
- - Logits/rejected: 3.4043
26
- - Logits/chosen: 2.7671
27
 
28
  ## Model description
29
 
@@ -54,21 +56,40 @@ The following hyperparameters were used during training:
54
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
  - lr_scheduler_type: cosine
56
  - lr_scheduler_warmup_ratio: 0.1
57
- - num_epochs: 1
58
 
59
  ### Training results
60
 
61
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
62
  |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
63
- | 0.652 | 0.1047 | 100 | 0.6516 | -0.0197 | -0.1250 | 0.6905 | 0.1053 | -278.6530 | -293.7002 | -2.3910 | -2.4541 |
64
- | 0.5817 | 0.2093 | 200 | 0.5833 | -0.8527 | -1.3403 | 0.7123 | 0.4876 | -400.1837 | -376.9992 | -1.5444 | -1.6837 |
65
- | 0.5434 | 0.3140 | 300 | 0.5530 | -0.9620 | -1.6381 | 0.7460 | 0.6761 | -429.9622 | -387.9330 | -0.5465 | -0.7917 |
66
- | 0.5601 | 0.4186 | 400 | 0.5357 | -0.8421 | -1.5059 | 0.7440 | 0.6638 | -416.7414 | -375.9344 | 1.0675 | 0.6506 |
67
- | 0.523 | 0.5233 | 500 | 0.5214 | -1.0264 | -1.8394 | 0.7599 | 0.8130 | -450.0945 | -394.3706 | 2.7809 | 2.2498 |
68
- | 0.4939 | 0.6279 | 600 | 0.5188 | -1.2174 | -2.0583 | 0.7599 | 0.8409 | -471.9797 | -413.4645 | 2.9773 | 2.3838 |
69
- | 0.4934 | 0.7326 | 700 | 0.5118 | -1.2353 | -2.1356 | 0.7698 | 0.9003 | -479.7107 | -415.2548 | 3.3093 | 2.6735 |
70
- | 0.4975 | 0.8373 | 800 | 0.5096 | -1.1525 | -2.0253 | 0.7679 | 0.8729 | -468.6864 | -406.9773 | 3.3466 | 2.7191 |
71
- | 0.4913 | 0.9419 | 900 | 0.5082 | -1.1578 | -2.0459 | 0.7639 | 0.8881 | -470.7423 | -407.5118 | 3.4043 | 2.7671 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
 
73
 
74
  ### Framework versions
 
1
  ---
2
+ license: apache-2.0
3
+ base_model: alignment-handbook/zephyr-7b-sft-full
4
  tags:
5
  - trl
6
  - dpo
 
15
 
16
  # zephyr-7b-dpo-full
17
 
18
+ This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 0.9106
21
+ - Rewards/chosen: -6.3982
22
+ - Rewards/rejected: -10.5970
23
  - Rewards/accuracies: 0.7639
24
+ - Rewards/margins: 4.1988
25
+ - Logps/rejected: -366.1889
26
+ - Logps/chosen: -345.9361
27
+ - Logits/rejected: -1.3900
28
+ - Logits/chosen: -1.6525
29
 
30
  ## Model description
31
 
 
56
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
57
  - lr_scheduler_type: cosine
58
  - lr_scheduler_warmup_ratio: 0.1
59
+ - num_epochs: 3
60
 
61
  ### Training results
62
 
63
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
64
  |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
65
+ | 0.6044 | 0.1047 | 100 | 0.6129 | 0.3596 | 0.0870 | 0.7302 | 0.2726 | -259.3489 | -278.3580 | -2.5834 | -2.6369 |
66
+ | 0.57 | 0.2093 | 200 | 0.5571 | 0.5922 | -0.1676 | 0.7540 | 0.7598 | -261.8945 | -276.0320 | -2.4867 | -2.5465 |
67
+ | 0.5429 | 0.3140 | 300 | 0.5366 | 0.0019 | -0.9625 | 0.7540 | 0.9644 | -269.8440 | -281.9351 | -2.3542 | -2.4208 |
68
+ | 0.5168 | 0.4186 | 400 | 0.5452 | 0.1591 | -0.8845 | 0.7599 | 1.0436 | -269.0635 | -280.3629 | -2.4760 | -2.5389 |
69
+ | 0.5337 | 0.5233 | 500 | 0.5324 | 0.1371 | -1.0631 | 0.7778 | 1.2002 | -270.8497 | -280.5833 | -2.4225 | -2.4845 |
70
+ | 0.5163 | 0.6279 | 600 | 0.5369 | -0.3785 | -1.5394 | 0.7560 | 1.1609 | -275.6129 | -285.7394 | -2.4333 | -2.4912 |
71
+ | 0.4881 | 0.7326 | 700 | 0.5380 | 0.1243 | -1.2129 | 0.7679 | 1.3371 | -272.3477 | -280.7114 | -2.3892 | -2.4505 |
72
+ | 0.49 | 0.8373 | 800 | 0.5411 | 0.1149 | -1.0375 | 0.7639 | 1.1524 | -270.5944 | -280.8054 | -2.4479 | -2.5044 |
73
+ | 0.5097 | 0.9419 | 900 | 0.5622 | -0.2002 | -1.4670 | 0.7698 | 1.2668 | -274.8889 | -283.9564 | -2.5298 | -2.5820 |
74
+ | 0.1144 | 1.0466 | 1000 | 0.5714 | -0.2947 | -1.8774 | 0.7639 | 1.5826 | -278.9927 | -284.9014 | -2.5495 | -2.6080 |
75
+ | 0.087 | 1.1512 | 1100 | 0.5960 | -0.6932 | -2.6301 | 0.7837 | 1.9369 | -286.5200 | -288.8864 | -2.5036 | -2.5699 |
76
+ | 0.1122 | 1.2559 | 1200 | 0.6133 | -1.5655 | -3.6620 | 0.7540 | 2.0965 | -296.8384 | -297.6089 | -2.4063 | -2.4765 |
77
+ | 0.1303 | 1.3605 | 1300 | 0.6040 | -1.7575 | -3.6828 | 0.7837 | 1.9252 | -297.0464 | -299.5291 | -2.3747 | -2.4470 |
78
+ | 0.0884 | 1.4652 | 1400 | 0.6035 | -1.4203 | -3.2606 | 0.7798 | 1.8403 | -292.8251 | -296.1571 | -2.3840 | -2.4553 |
79
+ | 0.0807 | 1.5699 | 1500 | 0.6033 | -1.8277 | -3.9141 | 0.7877 | 2.0864 | -299.3599 | -300.2314 | -2.3962 | -2.4731 |
80
+ | 0.1027 | 1.6745 | 1600 | 0.6157 | -1.3414 | -3.3683 | 0.7857 | 2.0269 | -293.9024 | -295.3680 | -2.3746 | -2.4536 |
81
+ | 0.0989 | 1.7792 | 1700 | 0.6009 | -1.4146 | -3.5889 | 0.7917 | 2.1744 | -296.1083 | -296.0996 | -2.3750 | -2.4548 |
82
+ | 0.0945 | 1.8838 | 1800 | 0.6109 | -1.1285 | -3.3269 | 0.7877 | 2.1984 | -293.4879 | -293.2390 | -2.4051 | -2.4825 |
83
+ | 0.0789 | 1.9885 | 1900 | 0.6093 | -1.9115 | -4.0587 | 0.7837 | 2.1472 | -300.8062 | -301.0694 | -2.3968 | -2.4730 |
84
+ | 0.0086 | 2.0931 | 2000 | 0.7414 | -2.9121 | -5.9384 | 0.7758 | 3.0263 | -319.6029 | -311.0746 | -2.2016 | -2.2928 |
85
+ | 0.0137 | 2.1978 | 2100 | 0.8116 | -4.6780 | -8.1860 | 0.7679 | 3.5080 | -342.0789 | -328.7336 | -1.8924 | -2.0338 |
86
+ | 0.0152 | 2.3025 | 2200 | 0.8371 | -5.0993 | -8.7589 | 0.7679 | 3.6596 | -347.8080 | -332.9471 | -1.8207 | -1.9887 |
87
+ | 0.0062 | 2.4071 | 2300 | 0.8704 | -6.2532 | -10.1416 | 0.7679 | 3.8884 | -361.6346 | -344.4856 | -1.5897 | -1.8086 |
88
+ | 0.0124 | 2.5118 | 2400 | 0.8848 | -5.6604 | -9.6724 | 0.7698 | 4.0120 | -356.9429 | -338.5582 | -1.5561 | -1.7751 |
89
+ | 0.0078 | 2.6164 | 2500 | 0.8926 | -6.1681 | -10.2415 | 0.7679 | 4.0734 | -362.6336 | -343.6352 | -1.4181 | -1.6590 |
90
+ | 0.0083 | 2.7211 | 2600 | 0.9002 | -6.5323 | -10.6541 | 0.7659 | 4.1218 | -366.7602 | -347.2773 | -1.3929 | -1.6493 |
91
+ | 0.0115 | 2.8257 | 2700 | 0.9076 | -6.4271 | -10.6033 | 0.7639 | 4.1762 | -366.2516 | -346.2245 | -1.4047 | -1.6632 |
92
+ | 0.0134 | 2.9304 | 2800 | 0.9106 | -6.3982 | -10.5970 | 0.7639 | 4.1988 | -366.1889 | -345.9361 | -1.3900 | -1.6525 |
93
 
94
 
95
  ### Framework versions
all_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
- "epoch": 0.9994767137624281,
3
  "total_flos": 0.0,
4
- "train_loss": 0.546376849968396,
5
- "train_runtime": 17587.6764,
6
  "train_samples": 61134,
7
- "train_samples_per_second": 3.476,
8
- "train_steps_per_second": 0.054
9
  }
 
1
  {
2
+ "epoch": 2.998430141287284,
3
  "total_flos": 0.0,
4
+ "train_loss": 0.21882489114921755,
5
+ "train_runtime": 49934.9402,
6
  "train_samples": 61134,
7
+ "train_samples_per_second": 3.673,
8
+ "train_steps_per_second": 0.057
9
  }
model-00001-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9d200a77dccf726f41ed8cf92716de458b39171937491ae87a7b744fb8a87cd5
3
  size 4943162336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:86c5b45d22034ce7b89eaa158037838555f74fbdd0d9300dbb2e1bb0b3f9dcf2
3
  size 4943162336
model-00002-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3a8a602d8ee31ff3d4aa54dbde4bf615ce6be798331993dfbb13ef6df327f7fe
3
  size 4999819336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:588e572b158221130f0142ff3a24869a60e624c2ac607d289f55ed6d0e02fce2
3
  size 4999819336
model-00003-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:168ead9eb319b2d5b5fc7d0647e965f8cd2c8e73b6f479d3bf1b8f654f59aa37
3
  size 4540516344
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:90c09400b922aa1b978fa46682c1af84bf8b301a834dfe99680ff3ac83e73188
3
  size 4540516344
runs/May14_17-59-18_ubuntu/events.out.tfevents.1715709983.ubuntu.774053.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:04310fd4fe22faa1292fffa2647f7ad12ef107ab48c234631aafead6ba0ca620
3
- size 218743
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:39e5a54dafc2cc6797104e2b48a0acc02a95708b20217cb29687c75dec886b75
3
+ size 223225
train_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
- "epoch": 0.9994767137624281,
3
  "total_flos": 0.0,
4
- "train_loss": 0.546376849968396,
5
- "train_runtime": 17587.6764,
6
  "train_samples": 61134,
7
- "train_samples_per_second": 3.476,
8
- "train_steps_per_second": 0.054
9
  }
 
1
  {
2
+ "epoch": 2.998430141287284,
3
  "total_flos": 0.0,
4
+ "train_loss": 0.21882489114921755,
5
+ "train_runtime": 49934.9402,
6
  "train_samples": 61134,
7
+ "train_samples_per_second": 3.673,
8
+ "train_steps_per_second": 0.057
9
  }
trainer_state.json CHANGED
The diff for this file is too large to render. See raw diff