happii commited on
Commit
0e8774a
1 Parent(s): f382ca9

Model save

Browse files
README.md CHANGED
@@ -2,15 +2,9 @@
2
  license: apache-2.0
3
  base_model: alignment-handbook/zephyr-7b-sft-full
4
  tags:
5
- - alignment-handbook
6
  - trl
7
  - dpo
8
  - generated_from_trainer
9
- - trl
10
- - dpo
11
- - generated_from_trainer
12
- datasets:
13
- - HuggingFaceH4/ultrafeedback_binarized
14
  model-index:
15
  - name: zephyr-7b-dpo-full
16
  results: []
@@ -21,17 +15,17 @@ should probably proofread and complete it, then remove this comment. -->
21
 
22
  # zephyr-7b-dpo-full
23
 
24
- This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the HuggingFaceH4/ultrafeedback_binarized dataset.
25
  It achieves the following results on the evaluation set:
26
- - Loss: 0.5283
27
- - Rewards/chosen: -0.0163
28
- - Rewards/rejected: -1.2467
29
- - Rewards/accuracies: 0.7738
30
- - Rewards/margins: 1.2304
31
- - Logps/rejected: -272.6863
32
- - Logps/chosen: -282.1169
33
- - Logits/rejected: -2.5360
34
- - Logits/chosen: -2.5900
35
 
36
  ## Model description
37
 
@@ -52,31 +46,78 @@ More information needed
52
  The following hyperparameters were used during training:
53
  - learning_rate: 5e-07
54
  - train_batch_size: 8
55
- - eval_batch_size: 8
56
  - seed: 42
57
  - distributed_type: multi-GPU
58
  - num_devices: 4
59
- - gradient_accumulation_steps: 2
60
- - total_train_batch_size: 64
61
- - total_eval_batch_size: 32
62
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
63
- - lr_scheduler_type: cosine
64
  - lr_scheduler_warmup_ratio: 0.1
65
- - num_epochs: 1
66
 
67
  ### Training results
68
 
69
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
70
  |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
71
- | 0.5446 | 0.1047 | 100 | 0.5753 | 1.0111 | 0.3529 | 0.7242 | 0.6581 | -256.6898 | -271.8434 | -2.5161 | -2.5743 |
72
- | 0.5475 | 0.2093 | 200 | 0.5464 | 0.4347 | -0.4824 | 0.7639 | 0.9172 | -265.0432 | -277.6068 | -2.5380 | -2.5923 |
73
- | 0.5359 | 0.3140 | 300 | 0.5473 | 0.0697 | -1.0170 | 0.7579 | 1.0867 | -270.3889 | -281.2571 | -2.5066 | -2.5596 |
74
- | 0.5228 | 0.4186 | 400 | 0.5321 | -0.2311 | -1.3065 | 0.7540 | 1.0754 | -273.2837 | -284.2652 | -2.5933 | -2.6471 |
75
- | 0.5217 | 0.5233 | 500 | 0.5260 | 0.0143 | -1.2073 | 0.7877 | 1.2216 | -272.2919 | -281.8111 | -2.5195 | -2.5773 |
76
- | 0.517 | 0.6279 | 600 | 0.5262 | -0.2922 | -1.4562 | 0.7698 | 1.1640 | -274.7808 | -284.8755 | -2.5183 | -2.5744 |
77
- | 0.4766 | 0.7326 | 700 | 0.5279 | -0.0183 | -1.2936 | 0.7798 | 1.2753 | -273.1544 | -282.1366 | -2.5194 | -2.5751 |
78
- | 0.4894 | 0.8373 | 800 | 0.5257 | -0.0567 | -1.2594 | 0.7778 | 1.2027 | -272.8127 | -282.5211 | -2.5311 | -2.5851 |
79
- | 0.4722 | 0.9419 | 900 | 0.5280 | -0.0160 | -1.2503 | 0.7798 | 1.2343 | -272.7223 | -282.1141 | -2.5362 | -2.5901 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
 
81
 
82
  ### Framework versions
 
2
  license: apache-2.0
3
  base_model: alignment-handbook/zephyr-7b-sft-full
4
  tags:
 
5
  - trl
6
  - dpo
7
  - generated_from_trainer
 
 
 
 
 
8
  model-index:
9
  - name: zephyr-7b-dpo-full
10
  results: []
 
15
 
16
  # zephyr-7b-dpo-full
17
 
18
+ This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 1.1475
21
+ - Rewards/chosen: -10.8669
22
+ - Rewards/rejected: -16.0106
23
+ - Rewards/accuracies: 0.7285
24
+ - Rewards/margins: 5.1437
25
+ - Logps/rejected: -424.5231
26
+ - Logps/chosen: -383.5809
27
+ - Logits/rejected: -0.5915
28
+ - Logits/chosen: -1.0022
29
 
30
  ## Model description
31
 
 
46
  The following hyperparameters were used during training:
47
  - learning_rate: 5e-07
48
  - train_batch_size: 8
49
+ - eval_batch_size: 16
50
  - seed: 42
51
  - distributed_type: multi-GPU
52
  - num_devices: 4
53
+ - total_train_batch_size: 32
54
+ - total_eval_batch_size: 64
 
55
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
56
+ - lr_scheduler_type: linear
57
  - lr_scheduler_warmup_ratio: 0.1
58
+ - num_epochs: 3
59
 
60
  ### Training results
61
 
62
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
63
  |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
64
+ | 0.6673 | 0.0523 | 100 | 0.6670 | 0.0699 | 0.0097 | 0.6797 | 0.0602 | -264.3204 | -274.2128 | -2.5742 | -2.6289 |
65
+ | 0.5806 | 0.1047 | 200 | 0.5926 | 0.3801 | -0.0108 | 0.7051 | 0.3909 | -264.5256 | -271.1104 | -2.5225 | -2.5806 |
66
+ | 0.554 | 0.1570 | 300 | 0.5669 | 0.3096 | -0.4486 | 0.7246 | 0.7581 | -268.9032 | -271.8162 | -2.4975 | -2.5603 |
67
+ | 0.5674 | 0.2093 | 400 | 0.5521 | 0.7133 | -0.0663 | 0.7246 | 0.7797 | -265.0810 | -267.7786 | -2.4794 | -2.5387 |
68
+ | 0.512 | 0.2616 | 500 | 0.5478 | 0.1922 | -0.9270 | 0.7266 | 1.1192 | -273.6879 | -272.9901 | -2.4185 | -2.4842 |
69
+ | 0.5511 | 0.3140 | 600 | 0.5389 | -0.0115 | -1.1320 | 0.7539 | 1.1205 | -275.7375 | -275.0270 | -2.3648 | -2.4308 |
70
+ | 0.5851 | 0.3663 | 700 | 0.5448 | 0.0450 | -1.1453 | 0.7402 | 1.1903 | -275.8708 | -274.4615 | -2.4055 | -2.4622 |
71
+ | 0.5302 | 0.4186 | 800 | 0.5569 | -0.2258 | -1.2912 | 0.7324 | 1.0653 | -277.3294 | -277.1702 | -2.5104 | -2.5742 |
72
+ | 0.518 | 0.4710 | 900 | 0.5607 | -0.2557 | -1.4332 | 0.75 | 1.1775 | -278.7496 | -277.4685 | -2.4298 | -2.4910 |
73
+ | 0.5525 | 0.5233 | 1000 | 0.5601 | -0.7719 | -1.9891 | 0.7480 | 1.2172 | -284.3084 | -282.6305 | -2.4482 | -2.5089 |
74
+ | 0.5189 | 0.5756 | 1100 | 0.5515 | -0.4040 | -1.5951 | 0.7422 | 1.1911 | -280.3683 | -278.9518 | -2.4816 | -2.5430 |
75
+ | 0.5331 | 0.6279 | 1200 | 0.5453 | -0.5342 | -1.7671 | 0.7383 | 1.2329 | -282.0886 | -280.2540 | -2.4521 | -2.5080 |
76
+ | 0.5104 | 0.6803 | 1300 | 0.5511 | -0.4634 | -1.8916 | 0.7363 | 1.4282 | -283.3339 | -279.5460 | -2.4281 | -2.4909 |
77
+ | 0.4976 | 0.7326 | 1400 | 0.5413 | -0.3748 | -1.7652 | 0.7363 | 1.3904 | -282.0694 | -278.6596 | -2.4395 | -2.4947 |
78
+ | 0.4814 | 0.7849 | 1500 | 0.5447 | -0.8885 | -2.1522 | 0.7305 | 1.2637 | -285.9394 | -283.7968 | -2.4376 | -2.4908 |
79
+ | 0.5075 | 0.8373 | 1600 | 0.5423 | -0.3051 | -1.5253 | 0.7344 | 1.2202 | -279.6703 | -277.9630 | -2.4316 | -2.4816 |
80
+ | 0.4906 | 0.8896 | 1700 | 0.5806 | -1.4841 | -3.0212 | 0.7266 | 1.5371 | -294.6296 | -289.7531 | -2.4876 | -2.5438 |
81
+ | 0.536 | 0.9419 | 1800 | 0.5603 | -0.5951 | -2.1710 | 0.7383 | 1.5759 | -286.1272 | -280.8625 | -2.5694 | -2.6123 |
82
+ | 0.5164 | 0.9942 | 1900 | 0.5567 | -0.5404 | -2.0173 | 0.7422 | 1.4769 | -284.5909 | -280.3160 | -2.5490 | -2.5898 |
83
+ | 0.0947 | 1.0466 | 2000 | 0.5942 | -1.0618 | -2.9986 | 0.7344 | 1.9369 | -294.4039 | -285.5296 | -2.5622 | -2.6140 |
84
+ | 0.068 | 1.0989 | 2100 | 0.6230 | -1.6457 | -3.9093 | 0.7520 | 2.2636 | -303.5109 | -291.3689 | -2.4361 | -2.5042 |
85
+ | 0.0747 | 1.1512 | 2200 | 0.6291 | -1.3268 | -3.4945 | 0.7461 | 2.1677 | -299.3621 | -288.1795 | -2.3844 | -2.4542 |
86
+ | 0.0553 | 1.2036 | 2300 | 0.6765 | -2.2209 | -4.6502 | 0.7344 | 2.4293 | -310.9199 | -297.1208 | -2.4889 | -2.5616 |
87
+ | 0.1207 | 1.2559 | 2400 | 0.6530 | -1.7158 | -3.9584 | 0.7246 | 2.2427 | -304.0018 | -292.0695 | -2.4457 | -2.5092 |
88
+ | 0.152 | 1.3082 | 2500 | 0.6882 | -1.8791 | -4.3806 | 0.7207 | 2.5015 | -308.2237 | -293.7032 | -2.4232 | -2.4917 |
89
+ | 0.1114 | 1.3605 | 2600 | 0.6422 | -2.2334 | -4.3890 | 0.7227 | 2.1556 | -308.3074 | -297.2458 | -2.5713 | -2.6189 |
90
+ | 0.1173 | 1.4129 | 2700 | 0.6619 | -1.5700 | -4.0282 | 0.7266 | 2.4581 | -304.6991 | -290.6119 | -2.5152 | -2.5719 |
91
+ | 0.0925 | 1.4652 | 2800 | 0.6523 | -2.3231 | -4.6279 | 0.7207 | 2.3048 | -310.6963 | -298.1424 | -2.5141 | -2.5711 |
92
+ | 0.1221 | 1.5175 | 2900 | 0.6496 | -2.8770 | -5.1437 | 0.7266 | 2.2667 | -315.8546 | -303.6823 | -2.4733 | -2.5414 |
93
+ | 0.0807 | 1.5699 | 3000 | 0.6925 | -2.7762 | -5.3350 | 0.7383 | 2.5588 | -317.7678 | -302.6737 | -2.3267 | -2.4141 |
94
+ | 0.105 | 1.6222 | 3100 | 0.6540 | -2.6858 | -5.0067 | 0.7246 | 2.3209 | -314.4846 | -301.7698 | -2.3683 | -2.4395 |
95
+ | 0.1162 | 1.6745 | 3200 | 0.6481 | -1.8133 | -4.0448 | 0.7148 | 2.2315 | -304.8652 | -293.0446 | -2.3670 | -2.4379 |
96
+ | 0.0667 | 1.7268 | 3300 | 0.6541 | -2.0364 | -4.3933 | 0.7363 | 2.3569 | -308.3506 | -295.2763 | -2.2794 | -2.3589 |
97
+ | 0.0935 | 1.7792 | 3400 | 0.6690 | -2.7292 | -5.2592 | 0.7441 | 2.5300 | -317.0096 | -302.2036 | -2.2855 | -2.3694 |
98
+ | 0.095 | 1.8315 | 3500 | 0.6361 | -2.9308 | -5.1591 | 0.7266 | 2.2284 | -316.0090 | -304.2198 | -2.3827 | -2.4530 |
99
+ | 0.0719 | 1.8838 | 3600 | 0.6778 | -2.3616 | -4.8272 | 0.7246 | 2.4656 | -312.6893 | -298.5278 | -2.4285 | -2.5018 |
100
+ | 0.0729 | 1.9362 | 3700 | 0.6754 | -2.9280 | -5.4360 | 0.7285 | 2.5080 | -318.7774 | -304.1916 | -2.4287 | -2.5049 |
101
+ | 0.0867 | 1.9885 | 3800 | 0.6744 | -3.0956 | -5.5458 | 0.7324 | 2.4502 | -319.8756 | -305.8675 | -2.3542 | -2.4301 |
102
+ | 0.0057 | 2.0408 | 3900 | 0.8833 | -5.0083 | -8.7774 | 0.7324 | 3.7690 | -352.1913 | -324.9953 | -1.5131 | -1.7155 |
103
+ | 0.0042 | 2.0931 | 4000 | 0.9722 | -6.1264 | -10.3554 | 0.7441 | 4.2290 | -367.9712 | -336.1759 | -1.6158 | -1.8694 |
104
+ | 0.0144 | 2.1455 | 4100 | 1.0865 | -7.7872 | -12.6090 | 0.7227 | 4.8218 | -390.5074 | -352.7837 | -1.3817 | -1.7022 |
105
+ | 0.0222 | 2.1978 | 4200 | 1.1130 | -7.9969 | -12.8510 | 0.7090 | 4.8541 | -392.9280 | -354.8811 | -1.3909 | -1.6967 |
106
+ | 0.0062 | 2.2501 | 4300 | 1.0722 | -8.7884 | -13.4773 | 0.7188 | 4.6889 | -399.1902 | -362.7955 | -1.5072 | -1.7459 |
107
+ | 0.0164 | 2.3025 | 4400 | 1.0993 | -8.7821 | -13.5683 | 0.7246 | 4.7862 | -400.1005 | -362.7325 | -1.2294 | -1.5182 |
108
+ | 0.0043 | 2.3548 | 4500 | 1.1250 | -9.9027 | -14.7785 | 0.7324 | 4.8758 | -412.2026 | -373.9385 | -0.7476 | -1.0957 |
109
+ | 0.0055 | 2.4071 | 4600 | 1.1975 | -10.4385 | -15.5644 | 0.7285 | 5.1258 | -420.0612 | -379.2971 | -0.5940 | -1.0020 |
110
+ | 0.0096 | 2.4594 | 4700 | 1.1443 | -10.2507 | -15.1793 | 0.7344 | 4.9286 | -416.2106 | -377.4187 | -0.9036 | -1.2413 |
111
+ | 0.0121 | 2.5118 | 4800 | 1.1422 | -10.3821 | -15.4221 | 0.7188 | 5.0400 | -418.6388 | -378.7332 | -0.8425 | -1.2175 |
112
+ | 0.0129 | 2.5641 | 4900 | 1.1155 | -9.3510 | -14.2451 | 0.7227 | 4.8941 | -406.8687 | -368.4216 | -0.9190 | -1.2930 |
113
+ | 0.0027 | 2.6164 | 5000 | 1.1905 | -10.7239 | -16.0360 | 0.7246 | 5.3121 | -424.7772 | -382.1504 | -0.6076 | -1.0264 |
114
+ | 0.0069 | 2.6688 | 5100 | 1.1635 | -10.2624 | -15.5178 | 0.7266 | 5.2555 | -419.5960 | -377.5356 | -0.7336 | -1.1315 |
115
+ | 0.009 | 2.7211 | 5200 | 1.1697 | -10.4591 | -15.6846 | 0.7266 | 5.2255 | -421.2634 | -379.5029 | -0.5587 | -0.9680 |
116
+ | 0.0088 | 2.7734 | 5300 | 1.1614 | -9.6958 | -14.8576 | 0.7246 | 5.1618 | -412.9938 | -371.8698 | -0.7312 | -1.1117 |
117
+ | 0.0078 | 2.8257 | 5400 | 1.1537 | -10.1101 | -15.2615 | 0.7168 | 5.1514 | -417.0325 | -376.0129 | -0.6843 | -1.0802 |
118
+ | 0.0209 | 2.8781 | 5500 | 1.1425 | -10.8046 | -15.9002 | 0.7266 | 5.0956 | -423.4199 | -382.9582 | -0.5316 | -0.9493 |
119
+ | 0.0145 | 2.9304 | 5600 | 1.1673 | -10.6083 | -15.8081 | 0.7266 | 5.1997 | -422.4983 | -380.9951 | -0.5878 | -1.0058 |
120
+ | 0.0189 | 2.9827 | 5700 | 1.1475 | -10.8669 | -16.0106 | 0.7285 | 5.1437 | -424.5231 | -383.5809 | -0.5915 | -1.0022 |
121
 
122
 
123
  ### Framework versions
all_results.json CHANGED
@@ -1,22 +1,9 @@
1
  {
2
- "epoch": 0.9994767137624281,
3
- "eval_logits/chosen": -2.5900135040283203,
4
- "eval_logits/rejected": -2.5359914302825928,
5
- "eval_logps/chosen": -282.1169128417969,
6
- "eval_logps/rejected": -272.686279296875,
7
- "eval_loss": 0.5283266305923462,
8
- "eval_rewards/accuracies": 0.773809552192688,
9
- "eval_rewards/chosen": -0.016292406246066093,
10
- "eval_rewards/margins": 1.2304461002349854,
11
- "eval_rewards/rejected": -1.2467385530471802,
12
- "eval_runtime": 205.7802,
13
- "eval_samples": 2000,
14
- "eval_samples_per_second": 9.719,
15
- "eval_steps_per_second": 0.306,
16
  "total_flos": 0.0,
17
- "train_loss": 0.528570184657711,
18
- "train_runtime": 17823.4182,
19
  "train_samples": 61134,
20
- "train_samples_per_second": 3.43,
21
- "train_steps_per_second": 0.054
22
  }
 
1
  {
2
+ "epoch": 3.0,
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  "total_flos": 0.0,
4
+ "train_loss": 0.21746732159101023,
5
+ "train_runtime": 62085.3048,
6
  "train_samples": 61134,
7
+ "train_samples_per_second": 2.954,
8
+ "train_steps_per_second": 0.092
9
  }
model-00001-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3b1fd55fd4d17cad855ab9d45ac20a8b8fdcfc540a35a4967bb5ae2a3bc1bba5
3
  size 4943162336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cf6335a383119c6240ed2ccd1790c1b4c7895893610e82dae8e1380a4f186a13
3
  size 4943162336
model-00002-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:535b15dbb05eedc817a81d7b3e9bfec99e0b560cc28d17893ba3bb2fef53083a
3
  size 4999819336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6fa2f03d962c968ec3ab3722be87b02895e54fe097cb9bd74f76ac68c23e7a49
3
  size 4999819336
model-00003-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b5d8e6c1f583c7a8841c68e9e269d659c35e1cbe70880bcd676a0b75972a71b4
3
  size 4540516344
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cc0ac967786623ddf34044289298f49f18f0321cc64869ed3fb64e7bbb3f1b52
3
  size 4540516344
runs/May26_06-26-16_ubuntu/events.out.tfevents.1716705224.ubuntu.2970062.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c7e78b977aa971e6e480d57f97a8592a81edbb30f7d97adb0ba2474fcf404cb7
3
- size 439760
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ef747ec9e3d45ff22496172e4a3e775fcab85bd04588427e69649bcf1c7119e
3
+ size 442178
train_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
- "epoch": 0.9994767137624281,
3
  "total_flos": 0.0,
4
- "train_loss": 0.528570184657711,
5
- "train_runtime": 17823.4182,
6
  "train_samples": 61134,
7
- "train_samples_per_second": 3.43,
8
- "train_steps_per_second": 0.054
9
  }
 
1
  {
2
+ "epoch": 3.0,
3
  "total_flos": 0.0,
4
+ "train_loss": 0.21746732159101023,
5
+ "train_runtime": 62085.3048,
6
  "train_samples": 61134,
7
+ "train_samples_per_second": 2.954,
8
+ "train_steps_per_second": 0.092
9
  }
trainer_state.json CHANGED
The diff for this file is too large to render. See raw diff