nnheui commited on
Commit
e0adbef
1 Parent(s): 0c310f6

Model save

Browse files
README.md CHANGED
@@ -2,15 +2,10 @@
2
  license: apache-2.0
3
  base_model: nnheui/pythia-1.4b-sft-full
4
  tags:
5
- - alignment-handbook
6
- - trl
7
- - dpo
8
- - generated_from_trainer
9
  - trl
10
  - dpo
 
11
  - generated_from_trainer
12
- datasets:
13
- - HuggingFaceH4/ultrafeedback_binarized
14
  model-index:
15
  - name: pythia-1.4b-dpo-full
16
  results: []
@@ -21,21 +16,21 @@ should probably proofread and complete it, then remove this comment. -->
21
 
22
  # pythia-1.4b-dpo-full
23
 
24
- This model is a fine-tuned version of [nnheui/pythia-1.4b-sft-full](https://huggingface.co/nnheui/pythia-1.4b-sft-full) on the HuggingFaceH4/ultrafeedback_binarized dataset.
25
  It achieves the following results on the evaluation set:
26
- - Loss: 0.6257
27
- - Rewards/chosen: -0.5234
28
- - Rewards/rejected: -0.7812
29
- - Rewards/accuracies: 0.6597
30
- - Rewards/margins: 0.2578
31
- - Logps/rejected: -416.0
32
- - Logps/chosen: -446.0
33
- - Logits/rejected: -1.2422
34
- - Logits/chosen: -1.1953
35
- - Logps/chosen Top Tokens: -0.0007
36
- - Logps/rejected Top Tokens: -0.0007
37
- - Logps/chosen Bottom Tokens: -14.375
38
- - Logps/rejected Bottom Tokens: -14.3125
39
 
40
  ## Model description
41
 
@@ -66,17 +61,62 @@ The following hyperparameters were used during training:
66
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
67
  - lr_scheduler_type: cosine
68
  - lr_scheduler_warmup_ratio: 0.1
69
- - num_epochs: 1
70
 
71
  ### Training results
72
 
73
- | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Logps/chosen Top Tokens | Logps/rejected Top Tokens | Logps/chosen Bottom Tokens | Logps/rejected Bottom Tokens |
74
- |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:-----------------------:|:-------------------------:|:--------------------------:|:----------------------------:|
75
- | 0.678 | 0.1963 | 100 | 0.6789 | -0.0275 | -0.0608 | 0.5881 | 0.0332 | -344.0 | -396.0 | -1.1562 | -1.0938 | -0.0009 | -0.0009 | -14.0625 | -14.0 |
76
- | 0.645 | 0.3925 | 200 | 0.6489 | -0.2871 | -0.4238 | 0.6448 | 0.1367 | -380.0 | -422.0 | -1.2031 | -1.1562 | -0.0009 | -0.0009 | -14.375 | -14.3125 |
77
- | 0.6396 | 0.5888 | 300 | 0.6304 | -0.4512 | -0.6797 | 0.6627 | 0.2275 | -406.0 | -438.0 | -1.2344 | -1.1875 | -0.0007 | -0.0008 | -14.375 | -14.3125 |
78
- | 0.6102 | 0.7851 | 400 | 0.6268 | -0.5039 | -0.7617 | 0.6567 | 0.2578 | -414.0 | -444.0 | -1.2344 | -1.1875 | -0.0007 | -0.0007 | -14.3125 | -14.25 |
79
- | 0.6084 | 0.9814 | 500 | 0.6259 | -0.5234 | -0.7852 | 0.6567 | 0.2617 | -416.0 | -446.0 | -1.2422 | -1.1953 | -0.0007 | -0.0007 | -14.375 | -14.3125 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
 
81
 
82
  ### Framework versions
 
2
  license: apache-2.0
3
  base_model: nnheui/pythia-1.4b-sft-full
4
  tags:
 
 
 
 
5
  - trl
6
  - dpo
7
+ - alignment-handbook
8
  - generated_from_trainer
 
 
9
  model-index:
10
  - name: pythia-1.4b-dpo-full
11
  results: []
 
16
 
17
  # pythia-1.4b-dpo-full
18
 
19
+ This model is a fine-tuned version of [nnheui/pythia-1.4b-sft-full](https://huggingface.co/nnheui/pythia-1.4b-sft-full) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
+ - Loss: 0.5967
22
+ - Rewards/chosen: -1.8672
23
+ - Rewards/rejected: -2.6406
24
+ - Rewards/accuracies: 0.7134
25
+ - Rewards/margins: 0.7734
26
+ - Logps/rejected: -600.0
27
+ - Logps/chosen: -580.0
28
+ - Logits/rejected: -1.4375
29
+ - Logits/chosen: -1.4062
30
+ - Logps/chosen Top Tokens: -0.0009
31
+ - Logps/rejected Top Tokens: -0.0008
32
+ - Logps/chosen Bottom Tokens: -13.9375
33
+ - Logps/rejected Bottom Tokens: -13.8125
34
 
35
  ## Model description
36
 
 
61
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
62
  - lr_scheduler_type: cosine
63
  - lr_scheduler_warmup_ratio: 0.1
64
+ - num_epochs: 10
65
 
66
  ### Training results
67
 
68
+ | Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/chosen Bottom Tokens | Logps/chosen Top Tokens | Logps/rejected | Logps/rejected Bottom Tokens | Logps/rejected Top Tokens | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected |
69
+ |:-------------:|:------:|:----:|:-------------:|:---------------:|:------------:|:--------------------------:|:-----------------------:|:--------------:|:----------------------------:|:-------------------------:|:---------------:|:------------------:|:--------------:|:---------------:|:----------------:|
70
+ | 0.678 | 0.1963 | 100 | -1.0938 | -1.1562 | -396.0 | -14.0625 | -0.0009 | -344.0 | -14.0 | -0.0009 | 0.6789 | 0.5881 | -0.0275 | 0.0332 | -0.0608 |
71
+ | 0.645 | 0.3925 | 200 | -1.1562 | -1.2031 | -422.0 | -14.375 | -0.0009 | -380.0 | -14.3125 | -0.0009 | 0.6489 | 0.6448 | -0.2871 | 0.1367 | -0.4238 |
72
+ | 0.6396 | 0.5888 | 300 | -1.1875 | -1.2344 | -438.0 | -14.375 | -0.0007 | -406.0 | -14.3125 | -0.0008 | 0.6304 | 0.6627 | -0.4512 | 0.2275 | -0.6797 |
73
+ | 0.6102 | 0.7851 | 400 | -1.1875 | -1.2344 | -444.0 | -14.3125 | -0.0007 | -414.0 | -14.25 | -0.0007 | 0.6268 | 0.6567 | -0.5039 | 0.2578 | -0.7617 |
74
+ | 0.6084 | 0.9814 | 500 | -1.1953 | -1.2422 | -446.0 | -14.375 | -0.0007 | -416.0 | -14.3125 | -0.0007 | 0.6259 | 0.6567 | -0.5234 | 0.2617 | -0.7852 |
75
+ | 0.6115 | 1.1776 | 600 | 0.6121 | -0.5547 | -0.8789 | 0.6806 | 0.3242 | -426.0 | -450.0 | -1.2578 | -1.2109 | -0.0006 | -0.0006 | -14.25 | -14.125 |
76
+ | 0.607 | 1.3739 | 700 | 0.6068 | -0.6641 | -1.0078 | 0.6985 | 0.3418 | -438.0 | -460.0 | -1.2812 | -1.2344 | -0.0006 | -0.0006 | -14.1875 | -14.125 |
77
+ | 0.5764 | 1.5702 | 800 | 0.5996 | -0.75 | -1.1406 | 0.6866 | 0.3887 | -452.0 | -468.0 | -1.3125 | -1.2656 | -0.0007 | -0.0007 | -14.25 | -14.125 |
78
+ | 0.5903 | 1.7664 | 900 | 0.5984 | -0.5898 | -0.9648 | 0.7045 | 0.3770 | -434.0 | -452.0 | -1.3125 | -1.2656 | -0.0006 | -0.0006 | -14.25 | -14.125 |
79
+ | 0.5697 | 1.9627 | 1000 | 0.5922 | -0.7383 | -1.1562 | 0.6866 | 0.4160 | -454.0 | -468.0 | -1.3125 | -1.2734 | -0.0007 | -0.0006 | -14.0625 | -14.0 |
80
+ | 0.5573 | 2.1590 | 1100 | 0.5854 | -0.8203 | -1.2812 | 0.6985 | 0.4570 | -466.0 | -476.0 | -1.3281 | -1.2891 | -0.0006 | -0.0006 | -14.125 | -14.0 |
81
+ | 0.5439 | 2.3553 | 1200 | 0.5845 | -1.1016 | -1.6172 | 0.6866 | 0.5078 | -498.0 | -504.0 | -1.3672 | -1.3281 | -0.0007 | -0.0006 | -14.0625 | -13.9375 |
82
+ | 0.5487 | 2.5515 | 1300 | 0.5801 | -0.8906 | -1.3828 | 0.6925 | 0.4980 | -476.0 | -482.0 | -1.3828 | -1.3438 | -0.0007 | -0.0006 | -14.0625 | -14.0 |
83
+ | 0.543 | 2.7478 | 1400 | 0.5785 | -0.8672 | -1.3516 | 0.7134 | 0.4863 | -474.0 | -480.0 | -1.375 | -1.3359 | -0.0007 | -0.0006 | -14.0625 | -13.9375 |
84
+ | 0.5382 | 2.9441 | 1500 | 0.5711 | -1.1172 | -1.6641 | 0.6955 | 0.5508 | -504.0 | -506.0 | -1.3906 | -1.3516 | -0.0007 | -0.0006 | -14.125 | -14.0 |
85
+ | 0.5117 | 3.1403 | 1600 | 0.5712 | -1.25 | -1.8281 | 0.7045 | 0.5742 | -520.0 | -520.0 | -1.3984 | -1.3594 | -0.0007 | -0.0006 | -14.125 | -14.0 |
86
+ | 0.4983 | 3.3366 | 1700 | 0.5703 | -1.1641 | -1.75 | 0.7015 | 0.5859 | -512.0 | -510.0 | -1.4062 | -1.3672 | -0.0007 | -0.0007 | -14.125 | -14.0 |
87
+ | 0.4976 | 3.5329 | 1800 | 0.5709 | -1.2656 | -1.8828 | 0.7254 | 0.6133 | -524.0 | -520.0 | -1.4141 | -1.375 | -0.0007 | -0.0007 | -14.125 | -14.0625 |
88
+ | 0.4956 | 3.7291 | 1900 | 0.5754 | -1.2266 | -1.8047 | 0.7164 | 0.5781 | -516.0 | -516.0 | -1.4062 | -1.3672 | -0.0008 | -0.0008 | -14.0625 | -13.9375 |
89
+ | 0.4996 | 3.9254 | 2000 | 0.5722 | -1.2578 | -1.8516 | 0.7045 | 0.6016 | -524.0 | -520.0 | -1.4062 | -1.375 | -0.0008 | -0.0008 | -14.0625 | -13.9375 |
90
+ | 0.4588 | 4.1217 | 2100 | 0.5748 | -1.4141 | -2.0312 | 0.7343 | 0.6211 | -540.0 | -536.0 | -1.4062 | -1.375 | -0.0009 | -0.0009 | -14.0 | -13.875 |
91
+ | 0.4555 | 4.3180 | 2200 | 0.5743 | -1.2969 | -1.9141 | 0.7164 | 0.6172 | -528.0 | -524.0 | -1.4219 | -1.3906 | -0.0009 | -0.0009 | -13.9375 | -13.8125 |
92
+ | 0.4625 | 4.5142 | 2300 | 0.5735 | -1.3047 | -1.9297 | 0.7134 | 0.625 | -532.0 | -524.0 | -1.4141 | -1.3828 | -0.0008 | -0.0008 | -14.0 | -13.875 |
93
+ | 0.469 | 4.7105 | 2400 | 0.5743 | -1.4766 | -2.1406 | 0.7194 | 0.6562 | -552.0 | -540.0 | -1.4375 | -1.3984 | -0.0009 | -0.0009 | -14.0 | -13.875 |
94
+ | 0.4796 | 4.9068 | 2500 | 0.5750 | -1.3281 | -1.9766 | 0.7134 | 0.6484 | -536.0 | -528.0 | -1.4375 | -1.3984 | -0.0009 | -0.0009 | -14.0 | -13.875 |
95
+ | 0.4082 | 5.1030 | 2600 | 0.5818 | -1.6016 | -2.2656 | 0.7194 | 0.6602 | -564.0 | -552.0 | -1.4453 | -1.4062 | -0.0009 | -0.0009 | -14.0 | -13.875 |
96
+ | 0.4193 | 5.2993 | 2700 | 0.5803 | -1.4922 | -2.1406 | 0.7194 | 0.6523 | -552.0 | -544.0 | -1.4375 | -1.3984 | -0.0009 | -0.0009 | -14.0 | -13.8125 |
97
+ | 0.419 | 5.4956 | 2800 | 0.5795 | -1.625 | -2.3281 | 0.7194 | 0.7031 | -572.0 | -556.0 | -1.4375 | -1.3984 | -0.0009 | -0.0009 | -14.0 | -13.875 |
98
+ | 0.4267 | 5.6919 | 2900 | 0.5780 | -1.6875 | -2.375 | 0.7134 | 0.6836 | -576.0 | -564.0 | -1.4375 | -1.4062 | -0.0009 | -0.0008 | -13.9375 | -13.8125 |
99
+ | 0.402 | 5.8881 | 3000 | 0.5828 | -1.6484 | -2.3594 | 0.7254 | 0.7109 | -572.0 | -560.0 | -1.4453 | -1.4062 | -0.0009 | -0.0009 | -13.9375 | -13.8125 |
100
+ | 0.3656 | 6.0844 | 3100 | 0.5844 | -1.6875 | -2.4062 | 0.7015 | 0.7227 | -580.0 | -564.0 | -1.4375 | -1.4062 | -0.0009 | -0.0009 | -14.0 | -13.875 |
101
+ | 0.3971 | 6.2807 | 3200 | 0.5873 | -1.6094 | -2.3281 | 0.7075 | 0.7148 | -572.0 | -556.0 | -1.4453 | -1.4141 | -0.0009 | -0.0009 | -14.0 | -13.8125 |
102
+ | 0.3923 | 6.4769 | 3300 | 0.5906 | -1.6875 | -2.4062 | 0.7075 | 0.7188 | -580.0 | -564.0 | -1.4453 | -1.4141 | -0.0009 | -0.0009 | -14.0 | -13.875 |
103
+ | 0.4011 | 6.6732 | 3400 | 0.5848 | -1.7109 | -2.4375 | 0.7254 | 0.7344 | -584.0 | -564.0 | -1.4375 | -1.4062 | -0.0009 | -0.0008 | -14.0 | -13.875 |
104
+ | 0.3838 | 6.8695 | 3500 | 0.5897 | -1.75 | -2.4844 | 0.7164 | 0.7305 | -584.0 | -568.0 | -1.4297 | -1.3984 | -0.0009 | -0.0008 | -13.9375 | -13.8125 |
105
+ | 0.3762 | 7.0658 | 3600 | 0.5910 | -1.7812 | -2.5312 | 0.7134 | 0.7422 | -592.0 | -572.0 | -1.4375 | -1.4062 | -0.0009 | -0.0008 | -13.9375 | -13.8125 |
106
+ | 0.3591 | 7.2620 | 3700 | 0.5895 | -1.7812 | -2.5312 | 0.7075 | 0.7578 | -592.0 | -572.0 | -1.4375 | -1.4062 | -0.0009 | -0.0009 | -14.0 | -13.875 |
107
+ | 0.3713 | 7.4583 | 3800 | 0.5956 | -1.7734 | -2.5312 | 0.7164 | 0.75 | -592.0 | -572.0 | -1.4297 | -1.3984 | -0.0009 | -0.0009 | -13.9375 | -13.8125 |
108
+ | 0.381 | 7.6546 | 3900 | 0.5948 | -1.8672 | -2.625 | 0.7164 | 0.7695 | -600.0 | -580.0 | -1.4375 | -1.4062 | -0.0009 | -0.0008 | -13.9375 | -13.8125 |
109
+ | 0.3639 | 7.8508 | 4000 | 0.5950 | -1.8672 | -2.625 | 0.7194 | 0.7578 | -600.0 | -580.0 | -1.4375 | -1.4062 | -0.0009 | -0.0009 | -13.9375 | -13.8125 |
110
+ | 0.3563 | 8.0471 | 4100 | 0.5939 | -1.8281 | -2.5781 | 0.7075 | 0.7539 | -596.0 | -576.0 | -1.4297 | -1.3984 | -0.0009 | -0.0009 | -13.9375 | -13.8125 |
111
+ | 0.3484 | 8.2434 | 4200 | 0.5969 | -1.875 | -2.6406 | 0.7045 | 0.7656 | -600.0 | -580.0 | -1.4375 | -1.4062 | -0.0009 | -0.0008 | -14.0 | -13.875 |
112
+ | 0.3359 | 8.4396 | 4300 | 0.5966 | -1.8828 | -2.6562 | 0.7045 | 0.7734 | -604.0 | -580.0 | -1.4375 | -1.4062 | -0.0009 | -0.0009 | -13.9375 | -13.8125 |
113
+ | 0.3639 | 8.6359 | 4400 | 0.5979 | -1.8516 | -2.5938 | 0.7075 | 0.7461 | -596.0 | -580.0 | -1.4297 | -1.3984 | -0.0009 | -0.0009 | -13.9375 | -13.8125 |
114
+ | 0.3563 | 8.8322 | 4500 | 0.5979 | -1.8594 | -2.625 | 0.7075 | 0.7617 | -600.0 | -580.0 | -1.4297 | -1.3984 | -0.0009 | -0.0009 | -13.9375 | -13.8125 |
115
+ | 0.353 | 9.0285 | 4600 | 0.5981 | -1.8672 | -2.625 | 0.6985 | 0.7617 | -600.0 | -580.0 | -1.4297 | -1.3984 | -0.0009 | -0.0008 | -13.9375 | -13.8125 |
116
+ | 0.3514 | 9.2247 | 4700 | 0.5979 | -1.8594 | -2.625 | 0.6985 | 0.7656 | -600.0 | -580.0 | -1.4297 | -1.3984 | -0.0009 | -0.0008 | -13.9375 | -13.8125 |
117
+ | 0.3434 | 9.4210 | 4800 | 0.5973 | -1.8672 | -2.6406 | 0.7015 | 0.7656 | -600.0 | -580.0 | -1.4297 | -1.4062 | -0.0009 | -0.0008 | -13.9375 | -13.8125 |
118
+ | 0.3492 | 9.6173 | 4900 | 0.5981 | -1.875 | -2.6406 | 0.7045 | 0.7578 | -600.0 | -580.0 | -1.4297 | -1.3984 | -0.0009 | -0.0008 | -13.9375 | -13.8125 |
119
+ | 0.3487 | 9.8135 | 5000 | 0.5967 | -1.8672 | -2.6406 | 0.7134 | 0.7734 | -600.0 | -580.0 | -1.4375 | -1.4062 | -0.0009 | -0.0008 | -13.9375 | -13.8125 |
120
 
121
 
122
  ### Framework versions
all_results.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "epoch": 0.9990186457311089,
3
  "eval_logits/chosen": -1.1953125,
4
  "eval_logits/rejected": -1.2421875,
5
  "eval_logps/chosen": -446.0,
@@ -18,9 +18,9 @@
18
  "eval_samples_per_second": 17.967,
19
  "eval_steps_per_second": 0.602,
20
  "total_flos": 0.0,
21
- "train_loss": 0.6464882252961105,
22
- "train_runtime": 8284.9703,
23
  "train_samples": 61134,
24
- "train_samples_per_second": 7.379,
25
- "train_steps_per_second": 0.061
26
  }
 
1
  {
2
+ "epoch": 9.99018645731109,
3
  "eval_logits/chosen": -1.1953125,
4
  "eval_logits/rejected": -1.2421875,
5
  "eval_logps/chosen": -446.0,
 
18
  "eval_samples_per_second": 17.967,
19
  "eval_steps_per_second": 0.602,
20
  "total_flos": 0.0,
21
+ "train_loss": 0.40154263242522953,
22
+ "train_runtime": 75277.8699,
23
  "train_samples": 61134,
24
+ "train_samples_per_second": 8.121,
25
+ "train_steps_per_second": 0.068
26
  }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:594a8e1e9b9b0898ed02c49720acd4aef70a436027ee0174c6ad725d13da763c
3
  size 2829330208
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6060099069aed780e713c3b7cfe9b0f92a0e7b79dc8d259e01c03c915414d839
3
  size 2829330208
runs/Jul08_16-05-34_42dbe5cf9ed4/events.out.tfevents.1720455302.42dbe5cf9ed4.852680.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c8ac027dd00a4a9c917aae9dcc56bd1d96e77fd9da6cb8e7f366c116031166d1
3
- size 487301
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:64692b8fb087cabdea941dbfe7e6bfdd8426c35fd90c9615306581399b8c80f1
3
+ size 496385
train_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
- "epoch": 0.9990186457311089,
3
  "total_flos": 0.0,
4
- "train_loss": 0.6464882252961105,
5
- "train_runtime": 8284.9703,
6
  "train_samples": 61134,
7
- "train_samples_per_second": 7.379,
8
- "train_steps_per_second": 0.061
9
  }
 
1
  {
2
+ "epoch": 9.99018645731109,
3
  "total_flos": 0.0,
4
+ "train_loss": 0.40154263242522953,
5
+ "train_runtime": 75277.8699,
6
  "train_samples": 61134,
7
+ "train_samples_per_second": 8.121,
8
+ "train_steps_per_second": 0.068
9
  }
trainer_state.json CHANGED
The diff for this file is too large to render. See raw diff