taicheng commited on
Commit
72a8978
·
verified ·
1 Parent(s): abc63d9

Model save

Browse files
README.md ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model: alignment-handbook/zephyr-7b-sft-full
5
+ tags:
6
+ - trl
7
+ - dpo
8
+ - generated_from_trainer
9
+ model-index:
10
+ - name: zephyr-7b-align-scan-6e-07-0.53-polynomial-2.0
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ # zephyr-7b-align-scan-6e-07-0.53-polynomial-2.0
18
+
19
+ This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on an unknown dataset.
20
+ It achieves the following results on the evaluation set:
21
+ - Loss: 0.8639
22
+ - Rewards/chosen: 0.6923
23
+ - Rewards/rejected: -0.5434
24
+ - Rewards/accuracies: 0.3393
25
+ - Rewards/margins: 1.2357
26
+ - Logps/rejected: -82.1536
27
+ - Logps/chosen: -73.1851
28
+ - Logits/rejected: -2.6910
29
+ - Logits/chosen: -2.7068
30
+
31
+ ## Model description
32
+
33
+ More information needed
34
+
35
+ ## Intended uses & limitations
36
+
37
+ More information needed
38
+
39
+ ## Training and evaluation data
40
+
41
+ More information needed
42
+
43
+ ## Training procedure
44
+
45
+ ### Training hyperparameters
46
+
47
+ The following hyperparameters were used during training:
48
+ - learning_rate: 6e-07
49
+ - train_batch_size: 8
50
+ - eval_batch_size: 8
51
+ - seed: 42
52
+ - distributed_type: multi-GPU
53
+ - num_devices: 4
54
+ - gradient_accumulation_steps: 2
55
+ - total_train_batch_size: 64
56
+ - total_eval_batch_size: 32
57
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
58
+ - lr_scheduler_type: polynomial
59
+ - lr_scheduler_warmup_ratio: 0.1
60
+ - num_epochs: 2
61
+
62
+ ### Training results
63
+
64
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
65
+ |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
66
+ | 0.7136 | 0.3484 | 100 | 0.7109 | 1.4011 | 0.8625 | 0.3512 | 0.5386 | -79.5010 | -71.8476 | -2.5458 | -2.5618 |
67
+ | 0.7461 | 0.6969 | 200 | 0.7643 | 1.0640 | 0.3687 | 0.3274 | 0.6952 | -80.4327 | -72.4838 | -2.5601 | -2.5759 |
68
+ | 0.3949 | 1.0453 | 300 | 0.7875 | 0.2070 | -0.6350 | 0.3472 | 0.8420 | -82.3265 | -74.1006 | -2.6135 | -2.6292 |
69
+ | 0.3838 | 1.3937 | 400 | 0.8714 | 0.4396 | -0.7042 | 0.3294 | 1.1438 | -82.4571 | -73.6618 | -2.6266 | -2.6422 |
70
+ | 0.371 | 1.7422 | 500 | 0.8639 | 0.6923 | -0.5434 | 0.3393 | 1.2357 | -82.1536 | -73.1851 | -2.6910 | -2.7068 |
71
+
72
+
73
+ ### Framework versions
74
+
75
+ - Transformers 4.44.2
76
+ - Pytorch 2.4.0
77
+ - Datasets 2.21.0
78
+ - Tokenizers 0.19.1
all_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.5567006942287139,
5
+ "train_runtime": 6469.9642,
6
+ "train_samples": 18340,
7
+ "train_samples_per_second": 5.669,
8
+ "train_steps_per_second": 0.089
9
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.44.2"
6
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.5567006942287139,
5
+ "train_runtime": 6469.9642,
6
+ "train_samples": 18340,
7
+ "train_samples_per_second": 5.669,
8
+ "train_steps_per_second": 0.089
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,992 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 2.0,
5
+ "eval_steps": 100,
6
+ "global_step": 574,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.003484320557491289,
13
+ "grad_norm": 306.63621441123377,
14
+ "learning_rate": 1.0344827586206896e-08,
15
+ "logits/chosen": -2.5345611572265625,
16
+ "logits/rejected": -2.581700563430786,
17
+ "logps/chosen": -60.002105712890625,
18
+ "logps/rejected": -99.98374938964844,
19
+ "loss": 0.6931,
20
+ "rewards/accuracies": 0.0,
21
+ "rewards/chosen": 0.0,
22
+ "rewards/margins": 0.0,
23
+ "rewards/rejected": 0.0,
24
+ "step": 1
25
+ },
26
+ {
27
+ "epoch": 0.03484320557491289,
28
+ "grad_norm": 287.10820765444424,
29
+ "learning_rate": 1.0344827586206897e-07,
30
+ "logits/chosen": -2.5633163452148438,
31
+ "logits/rejected": -2.562026023864746,
32
+ "logps/chosen": -59.65489196777344,
33
+ "logps/rejected": -73.39691925048828,
34
+ "loss": 0.6954,
35
+ "rewards/accuracies": 0.2152777761220932,
36
+ "rewards/chosen": 0.0025260683614760637,
37
+ "rewards/margins": 0.011007179506123066,
38
+ "rewards/rejected": -0.008481111377477646,
39
+ "step": 10
40
+ },
41
+ {
42
+ "epoch": 0.06968641114982578,
43
+ "grad_norm": 361.11744918130063,
44
+ "learning_rate": 2.0689655172413793e-07,
45
+ "logits/chosen": -2.60577654838562,
46
+ "logits/rejected": -2.5645222663879395,
47
+ "logps/chosen": -104.05818939208984,
48
+ "logps/rejected": -94.88358306884766,
49
+ "loss": 0.6868,
50
+ "rewards/accuracies": 0.3062500059604645,
51
+ "rewards/chosen": 0.0365118607878685,
52
+ "rewards/margins": 0.03092603012919426,
53
+ "rewards/rejected": 0.005585831124335527,
54
+ "step": 20
55
+ },
56
+ {
57
+ "epoch": 0.10452961672473868,
58
+ "grad_norm": 362.0188620910404,
59
+ "learning_rate": 3.103448275862069e-07,
60
+ "logits/chosen": -2.593327045440674,
61
+ "logits/rejected": -2.573579788208008,
62
+ "logps/chosen": -82.2002944946289,
63
+ "logps/rejected": -91.45396423339844,
64
+ "loss": 0.6711,
65
+ "rewards/accuracies": 0.29374998807907104,
66
+ "rewards/chosen": 0.15702371299266815,
67
+ "rewards/margins": 0.1113327145576477,
68
+ "rewards/rejected": 0.04569100961089134,
69
+ "step": 30
70
+ },
71
+ {
72
+ "epoch": 0.13937282229965156,
73
+ "grad_norm": 283.91931715411215,
74
+ "learning_rate": 4.1379310344827586e-07,
75
+ "logits/chosen": -2.4966464042663574,
76
+ "logits/rejected": -2.4948105812072754,
77
+ "logps/chosen": -77.7404556274414,
78
+ "logps/rejected": -73.33540344238281,
79
+ "loss": 0.6457,
80
+ "rewards/accuracies": 0.3125,
81
+ "rewards/chosen": 0.09431798756122589,
82
+ "rewards/margins": 0.27138587832450867,
83
+ "rewards/rejected": -0.17706790566444397,
84
+ "step": 40
85
+ },
86
+ {
87
+ "epoch": 0.17421602787456447,
88
+ "grad_norm": 251.26626326645746,
89
+ "learning_rate": 5.172413793103448e-07,
90
+ "logits/chosen": -2.5199971199035645,
91
+ "logits/rejected": -2.5240330696105957,
92
+ "logps/chosen": -62.982337951660156,
93
+ "logps/rejected": -75.54759216308594,
94
+ "loss": 0.6705,
95
+ "rewards/accuracies": 0.2750000059604645,
96
+ "rewards/chosen": 0.5071097612380981,
97
+ "rewards/margins": 0.21250581741333008,
98
+ "rewards/rejected": 0.2946038842201233,
99
+ "step": 50
100
+ },
101
+ {
102
+ "epoch": 0.20905923344947736,
103
+ "grad_norm": 245.4785949332324,
104
+ "learning_rate": 5.98062015503876e-07,
105
+ "logits/chosen": -2.473501682281494,
106
+ "logits/rejected": -2.4677951335906982,
107
+ "logps/chosen": -70.63660430908203,
108
+ "logps/rejected": -66.41564178466797,
109
+ "loss": 0.6582,
110
+ "rewards/accuracies": 0.32499998807907104,
111
+ "rewards/chosen": 1.3620513677597046,
112
+ "rewards/margins": 0.30580419301986694,
113
+ "rewards/rejected": 1.0562469959259033,
114
+ "step": 60
115
+ },
116
+ {
117
+ "epoch": 0.24390243902439024,
118
+ "grad_norm": 268.1673884309266,
119
+ "learning_rate": 5.883720930232558e-07,
120
+ "logits/chosen": -2.48606538772583,
121
+ "logits/rejected": -2.4808874130249023,
122
+ "logps/chosen": -60.53791046142578,
123
+ "logps/rejected": -65.51335906982422,
124
+ "loss": 0.669,
125
+ "rewards/accuracies": 0.3187499940395355,
126
+ "rewards/chosen": 1.9600938558578491,
127
+ "rewards/margins": 0.41606950759887695,
128
+ "rewards/rejected": 1.5440242290496826,
129
+ "step": 70
130
+ },
131
+ {
132
+ "epoch": 0.2787456445993031,
133
+ "grad_norm": 266.9849201621109,
134
+ "learning_rate": 5.786821705426356e-07,
135
+ "logits/chosen": -2.44217586517334,
136
+ "logits/rejected": -2.432021379470825,
137
+ "logps/chosen": -71.77671813964844,
138
+ "logps/rejected": -74.41423797607422,
139
+ "loss": 0.6931,
140
+ "rewards/accuracies": 0.3062500059604645,
141
+ "rewards/chosen": 2.0607221126556396,
142
+ "rewards/margins": 0.417347252368927,
143
+ "rewards/rejected": 1.643375039100647,
144
+ "step": 80
145
+ },
146
+ {
147
+ "epoch": 0.313588850174216,
148
+ "grad_norm": 299.0911881932943,
149
+ "learning_rate": 5.689922480620155e-07,
150
+ "logits/chosen": -2.486508846282959,
151
+ "logits/rejected": -2.5009188652038574,
152
+ "logps/chosen": -62.32392120361328,
153
+ "logps/rejected": -67.05072784423828,
154
+ "loss": 0.745,
155
+ "rewards/accuracies": 0.29374998807907104,
156
+ "rewards/chosen": 1.938929557800293,
157
+ "rewards/margins": 0.3153690993785858,
158
+ "rewards/rejected": 1.6235605478286743,
159
+ "step": 90
160
+ },
161
+ {
162
+ "epoch": 0.34843205574912894,
163
+ "grad_norm": 316.56572577244293,
164
+ "learning_rate": 5.593023255813953e-07,
165
+ "logits/chosen": -2.4774889945983887,
166
+ "logits/rejected": -2.4782986640930176,
167
+ "logps/chosen": -71.84193420410156,
168
+ "logps/rejected": -78.91864013671875,
169
+ "loss": 0.7136,
170
+ "rewards/accuracies": 0.32499998807907104,
171
+ "rewards/chosen": 1.8174957036972046,
172
+ "rewards/margins": 0.686999499797821,
173
+ "rewards/rejected": 1.1304961442947388,
174
+ "step": 100
175
+ },
176
+ {
177
+ "epoch": 0.34843205574912894,
178
+ "eval_logits/chosen": -2.561777114868164,
179
+ "eval_logits/rejected": -2.545793294906616,
180
+ "eval_logps/chosen": -71.84760284423828,
181
+ "eval_logps/rejected": -79.50099182128906,
182
+ "eval_loss": 0.7108728885650635,
183
+ "eval_rewards/accuracies": 0.3511904776096344,
184
+ "eval_rewards/chosen": 1.4011276960372925,
185
+ "eval_rewards/margins": 0.5386245846748352,
186
+ "eval_rewards/rejected": 0.8625030517578125,
187
+ "eval_runtime": 113.5305,
188
+ "eval_samples_per_second": 17.616,
189
+ "eval_steps_per_second": 0.555,
190
+ "step": 100
191
+ },
192
+ {
193
+ "epoch": 0.3832752613240418,
194
+ "grad_norm": 345.3192427581726,
195
+ "learning_rate": 5.496124031007752e-07,
196
+ "logits/chosen": -2.500793933868408,
197
+ "logits/rejected": -2.4660115242004395,
198
+ "logps/chosen": -72.03620910644531,
199
+ "logps/rejected": -62.499176025390625,
200
+ "loss": 0.7594,
201
+ "rewards/accuracies": 0.28125,
202
+ "rewards/chosen": 0.6009193658828735,
203
+ "rewards/margins": 0.3298734426498413,
204
+ "rewards/rejected": 0.2710459232330322,
205
+ "step": 110
206
+ },
207
+ {
208
+ "epoch": 0.4181184668989547,
209
+ "grad_norm": 243.7872401824864,
210
+ "learning_rate": 5.399224806201551e-07,
211
+ "logits/chosen": -2.5411460399627686,
212
+ "logits/rejected": -2.510791778564453,
213
+ "logps/chosen": -76.94505310058594,
214
+ "logps/rejected": -67.3199462890625,
215
+ "loss": 0.704,
216
+ "rewards/accuracies": 0.29374998807907104,
217
+ "rewards/chosen": 0.27375704050064087,
218
+ "rewards/margins": 0.6173890829086304,
219
+ "rewards/rejected": -0.3436321020126343,
220
+ "step": 120
221
+ },
222
+ {
223
+ "epoch": 0.4529616724738676,
224
+ "grad_norm": 432.4880968925023,
225
+ "learning_rate": 5.302325581395349e-07,
226
+ "logits/chosen": -2.5748209953308105,
227
+ "logits/rejected": -2.5566792488098145,
228
+ "logps/chosen": -83.82911682128906,
229
+ "logps/rejected": -89.06166076660156,
230
+ "loss": 0.7703,
231
+ "rewards/accuracies": 0.35624998807907104,
232
+ "rewards/chosen": 0.027380788698792458,
233
+ "rewards/margins": 0.8505465388298035,
234
+ "rewards/rejected": -0.823165774345398,
235
+ "step": 130
236
+ },
237
+ {
238
+ "epoch": 0.4878048780487805,
239
+ "grad_norm": 213.43504772096108,
240
+ "learning_rate": 5.205426356589147e-07,
241
+ "logits/chosen": -2.4663920402526855,
242
+ "logits/rejected": -2.456141233444214,
243
+ "logps/chosen": -79.90180969238281,
244
+ "logps/rejected": -71.20616149902344,
245
+ "loss": 0.7043,
246
+ "rewards/accuracies": 0.3499999940395355,
247
+ "rewards/chosen": 0.6591196060180664,
248
+ "rewards/margins": 0.893332302570343,
249
+ "rewards/rejected": -0.23421280086040497,
250
+ "step": 140
251
+ },
252
+ {
253
+ "epoch": 0.5226480836236934,
254
+ "grad_norm": 314.59295726808995,
255
+ "learning_rate": 5.108527131782946e-07,
256
+ "logits/chosen": -2.5461785793304443,
257
+ "logits/rejected": -2.5047733783721924,
258
+ "logps/chosen": -77.75875091552734,
259
+ "logps/rejected": -79.21798706054688,
260
+ "loss": 0.7534,
261
+ "rewards/accuracies": 0.30000001192092896,
262
+ "rewards/chosen": 0.9145609140396118,
263
+ "rewards/margins": 0.7381798624992371,
264
+ "rewards/rejected": 0.17638106644153595,
265
+ "step": 150
266
+ },
267
+ {
268
+ "epoch": 0.5574912891986062,
269
+ "grad_norm": 288.3107962146333,
270
+ "learning_rate": 5.011627906976744e-07,
271
+ "logits/chosen": -2.5273547172546387,
272
+ "logits/rejected": -2.5466020107269287,
273
+ "logps/chosen": -62.79814910888672,
274
+ "logps/rejected": -71.2327880859375,
275
+ "loss": 0.7312,
276
+ "rewards/accuracies": 0.26249998807907104,
277
+ "rewards/chosen": 0.9151515960693359,
278
+ "rewards/margins": 0.5228849649429321,
279
+ "rewards/rejected": 0.3922666311264038,
280
+ "step": 160
281
+ },
282
+ {
283
+ "epoch": 0.5923344947735192,
284
+ "grad_norm": 324.81976956861786,
285
+ "learning_rate": 4.914728682170542e-07,
286
+ "logits/chosen": -2.5521976947784424,
287
+ "logits/rejected": -2.5384509563446045,
288
+ "logps/chosen": -66.97964477539062,
289
+ "logps/rejected": -75.58006286621094,
290
+ "loss": 0.7191,
291
+ "rewards/accuracies": 0.29374998807907104,
292
+ "rewards/chosen": 1.0535058975219727,
293
+ "rewards/margins": 0.664495587348938,
294
+ "rewards/rejected": 0.3890102505683899,
295
+ "step": 170
296
+ },
297
+ {
298
+ "epoch": 0.627177700348432,
299
+ "grad_norm": 292.82490566453185,
300
+ "learning_rate": 4.817829457364341e-07,
301
+ "logits/chosen": -2.5870025157928467,
302
+ "logits/rejected": -2.5761828422546387,
303
+ "logps/chosen": -88.94615936279297,
304
+ "logps/rejected": -84.6361083984375,
305
+ "loss": 0.7825,
306
+ "rewards/accuracies": 0.33125001192092896,
307
+ "rewards/chosen": 1.5428403615951538,
308
+ "rewards/margins": 0.5494336485862732,
309
+ "rewards/rejected": 0.9934068918228149,
310
+ "step": 180
311
+ },
312
+ {
313
+ "epoch": 0.662020905923345,
314
+ "grad_norm": 234.8773688372816,
315
+ "learning_rate": 4.7209302325581395e-07,
316
+ "logits/chosen": -2.5796515941619873,
317
+ "logits/rejected": -2.57136607170105,
318
+ "logps/chosen": -68.72258758544922,
319
+ "logps/rejected": -79.8985595703125,
320
+ "loss": 0.7347,
321
+ "rewards/accuracies": 0.2874999940395355,
322
+ "rewards/chosen": 1.2343099117279053,
323
+ "rewards/margins": 0.37060627341270447,
324
+ "rewards/rejected": 0.8637038469314575,
325
+ "step": 190
326
+ },
327
+ {
328
+ "epoch": 0.6968641114982579,
329
+ "grad_norm": 439.89262521049136,
330
+ "learning_rate": 4.6240310077519373e-07,
331
+ "logits/chosen": -2.605844259262085,
332
+ "logits/rejected": -2.612717390060425,
333
+ "logps/chosen": -87.73692321777344,
334
+ "logps/rejected": -90.65494537353516,
335
+ "loss": 0.7461,
336
+ "rewards/accuracies": 0.36250001192092896,
337
+ "rewards/chosen": 1.5472986698150635,
338
+ "rewards/margins": 0.8075531125068665,
339
+ "rewards/rejected": 0.739745557308197,
340
+ "step": 200
341
+ },
342
+ {
343
+ "epoch": 0.6968641114982579,
344
+ "eval_logits/chosen": -2.5759267807006836,
345
+ "eval_logits/rejected": -2.5600757598876953,
346
+ "eval_logps/chosen": -72.48377227783203,
347
+ "eval_logps/rejected": -80.43267822265625,
348
+ "eval_loss": 0.7642679214477539,
349
+ "eval_rewards/accuracies": 0.3273809552192688,
350
+ "eval_rewards/chosen": 1.063955545425415,
351
+ "eval_rewards/margins": 0.6952447295188904,
352
+ "eval_rewards/rejected": 0.36871081590652466,
353
+ "eval_runtime": 113.5756,
354
+ "eval_samples_per_second": 17.609,
355
+ "eval_steps_per_second": 0.555,
356
+ "step": 200
357
+ },
358
+ {
359
+ "epoch": 0.7317073170731707,
360
+ "grad_norm": 437.59995338780175,
361
+ "learning_rate": 4.527131782945735e-07,
362
+ "logits/chosen": -2.5814366340637207,
363
+ "logits/rejected": -2.556798219680786,
364
+ "logps/chosen": -67.51020812988281,
365
+ "logps/rejected": -63.342933654785156,
366
+ "loss": 0.7354,
367
+ "rewards/accuracies": 0.35624998807907104,
368
+ "rewards/chosen": 0.9586971402168274,
369
+ "rewards/margins": 0.9261114001274109,
370
+ "rewards/rejected": 0.03258571773767471,
371
+ "step": 210
372
+ },
373
+ {
374
+ "epoch": 0.7665505226480837,
375
+ "grad_norm": 270.9450930053244,
376
+ "learning_rate": 4.4302325581395346e-07,
377
+ "logits/chosen": -2.6260294914245605,
378
+ "logits/rejected": -2.6077115535736084,
379
+ "logps/chosen": -71.33647155761719,
380
+ "logps/rejected": -70.29251861572266,
381
+ "loss": 0.7602,
382
+ "rewards/accuracies": 0.23125000298023224,
383
+ "rewards/chosen": 1.1245003938674927,
384
+ "rewards/margins": 0.3980458974838257,
385
+ "rewards/rejected": 0.7264544367790222,
386
+ "step": 220
387
+ },
388
+ {
389
+ "epoch": 0.8013937282229965,
390
+ "grad_norm": 365.5474496656497,
391
+ "learning_rate": 4.3333333333333335e-07,
392
+ "logits/chosen": -2.6431806087493896,
393
+ "logits/rejected": -2.623713493347168,
394
+ "logps/chosen": -86.96244812011719,
395
+ "logps/rejected": -87.527587890625,
396
+ "loss": 0.7878,
397
+ "rewards/accuracies": 0.375,
398
+ "rewards/chosen": 1.6102378368377686,
399
+ "rewards/margins": 1.2809720039367676,
400
+ "rewards/rejected": 0.329265832901001,
401
+ "step": 230
402
+ },
403
+ {
404
+ "epoch": 0.8362369337979094,
405
+ "grad_norm": 311.80252726932434,
406
+ "learning_rate": 4.2364341085271313e-07,
407
+ "logits/chosen": -2.643277406692505,
408
+ "logits/rejected": -2.609691619873047,
409
+ "logps/chosen": -83.11528015136719,
410
+ "logps/rejected": -77.69151306152344,
411
+ "loss": 0.8033,
412
+ "rewards/accuracies": 0.36250001192092896,
413
+ "rewards/chosen": 1.9219558238983154,
414
+ "rewards/margins": 0.6263306736946106,
415
+ "rewards/rejected": 1.2956254482269287,
416
+ "step": 240
417
+ },
418
+ {
419
+ "epoch": 0.8710801393728222,
420
+ "grad_norm": 352.8980579572314,
421
+ "learning_rate": 4.13953488372093e-07,
422
+ "logits/chosen": -2.65216064453125,
423
+ "logits/rejected": -2.617506504058838,
424
+ "logps/chosen": -91.78350067138672,
425
+ "logps/rejected": -87.70478820800781,
426
+ "loss": 0.6512,
427
+ "rewards/accuracies": 0.38749998807907104,
428
+ "rewards/chosen": 1.942857027053833,
429
+ "rewards/margins": 0.7549124956130981,
430
+ "rewards/rejected": 1.1879446506500244,
431
+ "step": 250
432
+ },
433
+ {
434
+ "epoch": 0.9059233449477352,
435
+ "grad_norm": 265.7817814419997,
436
+ "learning_rate": 4.0426356589147286e-07,
437
+ "logits/chosen": -2.5617775917053223,
438
+ "logits/rejected": -2.5759947299957275,
439
+ "logps/chosen": -56.67157745361328,
440
+ "logps/rejected": -64.38258361816406,
441
+ "loss": 0.7944,
442
+ "rewards/accuracies": 0.30000001192092896,
443
+ "rewards/chosen": 1.26022469997406,
444
+ "rewards/margins": 0.6000041961669922,
445
+ "rewards/rejected": 0.6602205038070679,
446
+ "step": 260
447
+ },
448
+ {
449
+ "epoch": 0.9407665505226481,
450
+ "grad_norm": 513.5414604532187,
451
+ "learning_rate": 3.9457364341085264e-07,
452
+ "logits/chosen": -2.6499624252319336,
453
+ "logits/rejected": -2.650109052658081,
454
+ "logps/chosen": -66.84712219238281,
455
+ "logps/rejected": -82.05715942382812,
456
+ "loss": 0.7273,
457
+ "rewards/accuracies": 0.3375000059604645,
458
+ "rewards/chosen": 1.0775161981582642,
459
+ "rewards/margins": 0.8259714841842651,
460
+ "rewards/rejected": 0.2515445053577423,
461
+ "step": 270
462
+ },
463
+ {
464
+ "epoch": 0.975609756097561,
465
+ "grad_norm": 338.03485816068525,
466
+ "learning_rate": 3.848837209302326e-07,
467
+ "logits/chosen": -2.550506591796875,
468
+ "logits/rejected": -2.529942512512207,
469
+ "logps/chosen": -65.82142639160156,
470
+ "logps/rejected": -70.44654846191406,
471
+ "loss": 0.6677,
472
+ "rewards/accuracies": 0.3187499940395355,
473
+ "rewards/chosen": 1.0668643712997437,
474
+ "rewards/margins": 0.8612996339797974,
475
+ "rewards/rejected": 0.20556476712226868,
476
+ "step": 280
477
+ },
478
+ {
479
+ "epoch": 1.0104529616724738,
480
+ "grad_norm": 43.71804076421765,
481
+ "learning_rate": 3.7519379844961237e-07,
482
+ "logits/chosen": -2.5674736499786377,
483
+ "logits/rejected": -2.54020357131958,
484
+ "logps/chosen": -68.04539489746094,
485
+ "logps/rejected": -65.61439514160156,
486
+ "loss": 0.5817,
487
+ "rewards/accuracies": 0.3812499940395355,
488
+ "rewards/chosen": 2.1318130493164062,
489
+ "rewards/margins": 2.503080368041992,
490
+ "rewards/rejected": -0.37126731872558594,
491
+ "step": 290
492
+ },
493
+ {
494
+ "epoch": 1.0452961672473868,
495
+ "grad_norm": 14.731335253505609,
496
+ "learning_rate": 3.6550387596899226e-07,
497
+ "logits/chosen": -2.6065874099731445,
498
+ "logits/rejected": -2.5927734375,
499
+ "logps/chosen": -59.9798698425293,
500
+ "logps/rejected": -77.5359115600586,
501
+ "loss": 0.3949,
502
+ "rewards/accuracies": 0.42500001192092896,
503
+ "rewards/chosen": 3.481590747833252,
504
+ "rewards/margins": 7.915454864501953,
505
+ "rewards/rejected": -4.433863639831543,
506
+ "step": 300
507
+ },
508
+ {
509
+ "epoch": 1.0452961672473868,
510
+ "eval_logits/chosen": -2.6291775703430176,
511
+ "eval_logits/rejected": -2.6134917736053467,
512
+ "eval_logps/chosen": -74.10063934326172,
513
+ "eval_logps/rejected": -82.3265380859375,
514
+ "eval_loss": 0.7874619364738464,
515
+ "eval_rewards/accuracies": 0.3472222089767456,
516
+ "eval_rewards/chosen": 0.20701055228710175,
517
+ "eval_rewards/margins": 0.8420494794845581,
518
+ "eval_rewards/rejected": -0.6350388526916504,
519
+ "eval_runtime": 113.6067,
520
+ "eval_samples_per_second": 17.605,
521
+ "eval_steps_per_second": 0.555,
522
+ "step": 300
523
+ },
524
+ {
525
+ "epoch": 1.0801393728222997,
526
+ "grad_norm": 4.834789646446964,
527
+ "learning_rate": 3.558139534883721e-07,
528
+ "logits/chosen": -2.578672409057617,
529
+ "logits/rejected": -2.5800602436065674,
530
+ "logps/chosen": -61.31939697265625,
531
+ "logps/rejected": -86.94207000732422,
532
+ "loss": 0.4017,
533
+ "rewards/accuracies": 0.4375,
534
+ "rewards/chosen": 3.34814715385437,
535
+ "rewards/margins": 9.546571731567383,
536
+ "rewards/rejected": -6.198423862457275,
537
+ "step": 310
538
+ },
539
+ {
540
+ "epoch": 1.1149825783972125,
541
+ "grad_norm": 99.9815974317529,
542
+ "learning_rate": 3.46124031007752e-07,
543
+ "logits/chosen": -2.6197047233581543,
544
+ "logits/rejected": -2.6059112548828125,
545
+ "logps/chosen": -68.153076171875,
546
+ "logps/rejected": -89.3502426147461,
547
+ "loss": 0.3832,
548
+ "rewards/accuracies": 0.46875,
549
+ "rewards/chosen": 3.07761812210083,
550
+ "rewards/margins": 8.979107856750488,
551
+ "rewards/rejected": -5.9014892578125,
552
+ "step": 320
553
+ },
554
+ {
555
+ "epoch": 1.1498257839721253,
556
+ "grad_norm": 65.16657207151003,
557
+ "learning_rate": 3.3643410852713177e-07,
558
+ "logits/chosen": -2.6080145835876465,
559
+ "logits/rejected": -2.584524631500244,
560
+ "logps/chosen": -77.22286224365234,
561
+ "logps/rejected": -88.59306335449219,
562
+ "loss": 0.3571,
563
+ "rewards/accuracies": 0.5249999761581421,
564
+ "rewards/chosen": 3.6562671661376953,
565
+ "rewards/margins": 8.938417434692383,
566
+ "rewards/rejected": -5.282149791717529,
567
+ "step": 330
568
+ },
569
+ {
570
+ "epoch": 1.1846689895470384,
571
+ "grad_norm": 20.064441334453125,
572
+ "learning_rate": 3.267441860465116e-07,
573
+ "logits/chosen": -2.58263897895813,
574
+ "logits/rejected": -2.5866923332214355,
575
+ "logps/chosen": -76.30878448486328,
576
+ "logps/rejected": -106.47459411621094,
577
+ "loss": 0.3758,
578
+ "rewards/accuracies": 0.5562499761581421,
579
+ "rewards/chosen": 3.9426658153533936,
580
+ "rewards/margins": 9.874300003051758,
581
+ "rewards/rejected": -5.931633949279785,
582
+ "step": 340
583
+ },
584
+ {
585
+ "epoch": 1.2195121951219512,
586
+ "grad_norm": 71.91053721699969,
587
+ "learning_rate": 3.170542635658915e-07,
588
+ "logits/chosen": -2.5975940227508545,
589
+ "logits/rejected": -2.565171003341675,
590
+ "logps/chosen": -61.74261474609375,
591
+ "logps/rejected": -74.67215728759766,
592
+ "loss": 0.3855,
593
+ "rewards/accuracies": 0.48750001192092896,
594
+ "rewards/chosen": 3.632841110229492,
595
+ "rewards/margins": 8.399141311645508,
596
+ "rewards/rejected": -4.766300201416016,
597
+ "step": 350
598
+ },
599
+ {
600
+ "epoch": 1.254355400696864,
601
+ "grad_norm": 78.70073770358177,
602
+ "learning_rate": 3.073643410852713e-07,
603
+ "logits/chosen": -2.59340238571167,
604
+ "logits/rejected": -2.566377639770508,
605
+ "logps/chosen": -65.64441680908203,
606
+ "logps/rejected": -72.45703125,
607
+ "loss": 0.3676,
608
+ "rewards/accuracies": 0.44999998807907104,
609
+ "rewards/chosen": 3.2890095710754395,
610
+ "rewards/margins": 6.889220237731934,
611
+ "rewards/rejected": -3.6002116203308105,
612
+ "step": 360
613
+ },
614
+ {
615
+ "epoch": 1.289198606271777,
616
+ "grad_norm": 42.412792949129724,
617
+ "learning_rate": 2.9767441860465116e-07,
618
+ "logits/chosen": -2.5649404525756836,
619
+ "logits/rejected": -2.5833497047424316,
620
+ "logps/chosen": -64.13664245605469,
621
+ "logps/rejected": -83.73193359375,
622
+ "loss": 0.399,
623
+ "rewards/accuracies": 0.46875,
624
+ "rewards/chosen": 3.6725857257843018,
625
+ "rewards/margins": 8.653525352478027,
626
+ "rewards/rejected": -4.980940818786621,
627
+ "step": 370
628
+ },
629
+ {
630
+ "epoch": 1.32404181184669,
631
+ "grad_norm": 89.8718675271705,
632
+ "learning_rate": 2.87984496124031e-07,
633
+ "logits/chosen": -2.576333999633789,
634
+ "logits/rejected": -2.5752272605895996,
635
+ "logps/chosen": -77.921875,
636
+ "logps/rejected": -98.92057800292969,
637
+ "loss": 0.3699,
638
+ "rewards/accuracies": 0.5562499761581421,
639
+ "rewards/chosen": 4.744899749755859,
640
+ "rewards/margins": 11.345683097839355,
641
+ "rewards/rejected": -6.600783348083496,
642
+ "step": 380
643
+ },
644
+ {
645
+ "epoch": 1.3588850174216027,
646
+ "grad_norm": 53.88569450482403,
647
+ "learning_rate": 2.7829457364341084e-07,
648
+ "logits/chosen": -2.6578681468963623,
649
+ "logits/rejected": -2.6431689262390137,
650
+ "logps/chosen": -59.8434944152832,
651
+ "logps/rejected": -80.90740203857422,
652
+ "loss": 0.3745,
653
+ "rewards/accuracies": 0.4437499940395355,
654
+ "rewards/chosen": 3.790491819381714,
655
+ "rewards/margins": 8.903260231018066,
656
+ "rewards/rejected": -5.112768650054932,
657
+ "step": 390
658
+ },
659
+ {
660
+ "epoch": 1.3937282229965158,
661
+ "grad_norm": 103.59808023860636,
662
+ "learning_rate": 2.686046511627907e-07,
663
+ "logits/chosen": -2.638619899749756,
664
+ "logits/rejected": -2.609290361404419,
665
+ "logps/chosen": -78.6235122680664,
666
+ "logps/rejected": -108.10832214355469,
667
+ "loss": 0.3838,
668
+ "rewards/accuracies": 0.5249999761581421,
669
+ "rewards/chosen": 3.987473249435425,
670
+ "rewards/margins": 9.130887985229492,
671
+ "rewards/rejected": -5.1434149742126465,
672
+ "step": 400
673
+ },
674
+ {
675
+ "epoch": 1.3937282229965158,
676
+ "eval_logits/chosen": -2.6421802043914795,
677
+ "eval_logits/rejected": -2.626558542251587,
678
+ "eval_logps/chosen": -73.66181945800781,
679
+ "eval_logps/rejected": -82.45710754394531,
680
+ "eval_loss": 0.871368408203125,
681
+ "eval_rewards/accuracies": 0.329365074634552,
682
+ "eval_rewards/chosen": 0.4395846724510193,
683
+ "eval_rewards/margins": 1.1438220739364624,
684
+ "eval_rewards/rejected": -0.7042374610900879,
685
+ "eval_runtime": 113.5108,
686
+ "eval_samples_per_second": 17.619,
687
+ "eval_steps_per_second": 0.555,
688
+ "step": 400
689
+ },
690
+ {
691
+ "epoch": 1.4285714285714286,
692
+ "grad_norm": 0.4652582217397593,
693
+ "learning_rate": 2.589147286821705e-07,
694
+ "logits/chosen": -2.625115156173706,
695
+ "logits/rejected": -2.6144914627075195,
696
+ "logps/chosen": -74.08203887939453,
697
+ "logps/rejected": -89.65892791748047,
698
+ "loss": 0.5043,
699
+ "rewards/accuracies": 0.48750001192092896,
700
+ "rewards/chosen": 4.521576881408691,
701
+ "rewards/margins": 9.441361427307129,
702
+ "rewards/rejected": -4.919784069061279,
703
+ "step": 410
704
+ },
705
+ {
706
+ "epoch": 1.4634146341463414,
707
+ "grad_norm": 14.542963331044218,
708
+ "learning_rate": 2.492248062015504e-07,
709
+ "logits/chosen": -2.6700663566589355,
710
+ "logits/rejected": -2.671051502227783,
711
+ "logps/chosen": -70.02223205566406,
712
+ "logps/rejected": -93.91789245605469,
713
+ "loss": 0.3962,
714
+ "rewards/accuracies": 0.4437499940395355,
715
+ "rewards/chosen": 3.217597484588623,
716
+ "rewards/margins": 8.012170791625977,
717
+ "rewards/rejected": -4.794573783874512,
718
+ "step": 420
719
+ },
720
+ {
721
+ "epoch": 1.4982578397212545,
722
+ "grad_norm": 7.616609761831896,
723
+ "learning_rate": 2.3953488372093024e-07,
724
+ "logits/chosen": -2.6445345878601074,
725
+ "logits/rejected": -2.630586624145508,
726
+ "logps/chosen": -61.7302360534668,
727
+ "logps/rejected": -81.66353607177734,
728
+ "loss": 0.3812,
729
+ "rewards/accuracies": 0.45625001192092896,
730
+ "rewards/chosen": 3.6740145683288574,
731
+ "rewards/margins": 8.53648567199707,
732
+ "rewards/rejected": -4.862471580505371,
733
+ "step": 430
734
+ },
735
+ {
736
+ "epoch": 1.533101045296167,
737
+ "grad_norm": 19.40227524240624,
738
+ "learning_rate": 2.2984496124031007e-07,
739
+ "logits/chosen": -2.639118194580078,
740
+ "logits/rejected": -2.648719310760498,
741
+ "logps/chosen": -61.67310333251953,
742
+ "logps/rejected": -81.26171875,
743
+ "loss": 0.3814,
744
+ "rewards/accuracies": 0.4312500059604645,
745
+ "rewards/chosen": 4.401946067810059,
746
+ "rewards/margins": 8.5157470703125,
747
+ "rewards/rejected": -4.113801002502441,
748
+ "step": 440
749
+ },
750
+ {
751
+ "epoch": 1.5679442508710801,
752
+ "grad_norm": 11.176099808562324,
753
+ "learning_rate": 2.201550387596899e-07,
754
+ "logits/chosen": -2.7013635635375977,
755
+ "logits/rejected": -2.6666698455810547,
756
+ "logps/chosen": -81.5969009399414,
757
+ "logps/rejected": -96.87474060058594,
758
+ "loss": 0.3926,
759
+ "rewards/accuracies": 0.518750011920929,
760
+ "rewards/chosen": 4.899897575378418,
761
+ "rewards/margins": 10.683164596557617,
762
+ "rewards/rejected": -5.783267021179199,
763
+ "step": 450
764
+ },
765
+ {
766
+ "epoch": 1.6027874564459932,
767
+ "grad_norm": 123.87369972777346,
768
+ "learning_rate": 2.1046511627906974e-07,
769
+ "logits/chosen": -2.6888933181762695,
770
+ "logits/rejected": -2.6685612201690674,
771
+ "logps/chosen": -66.90978240966797,
772
+ "logps/rejected": -88.9155044555664,
773
+ "loss": 0.4047,
774
+ "rewards/accuracies": 0.5,
775
+ "rewards/chosen": 4.9129533767700195,
776
+ "rewards/margins": 11.225677490234375,
777
+ "rewards/rejected": -6.312723636627197,
778
+ "step": 460
779
+ },
780
+ {
781
+ "epoch": 1.6376306620209058,
782
+ "grad_norm": 3.43139199879915,
783
+ "learning_rate": 2.0077519379844966e-07,
784
+ "logits/chosen": -2.6837058067321777,
785
+ "logits/rejected": -2.674848794937134,
786
+ "logps/chosen": -54.61207962036133,
787
+ "logps/rejected": -79.80362701416016,
788
+ "loss": 0.3979,
789
+ "rewards/accuracies": 0.40625,
790
+ "rewards/chosen": 3.5458691120147705,
791
+ "rewards/margins": 7.8562331199646,
792
+ "rewards/rejected": -4.310364246368408,
793
+ "step": 470
794
+ },
795
+ {
796
+ "epoch": 1.6724738675958188,
797
+ "grad_norm": 12.439142544759255,
798
+ "learning_rate": 1.9108527131782944e-07,
799
+ "logits/chosen": -2.695263385772705,
800
+ "logits/rejected": -2.6781885623931885,
801
+ "logps/chosen": -48.82516860961914,
802
+ "logps/rejected": -57.3747444152832,
803
+ "loss": 0.4033,
804
+ "rewards/accuracies": 0.3687500059604645,
805
+ "rewards/chosen": 2.775264024734497,
806
+ "rewards/margins": 5.633866786956787,
807
+ "rewards/rejected": -2.8586020469665527,
808
+ "step": 480
809
+ },
810
+ {
811
+ "epoch": 1.7073170731707317,
812
+ "grad_norm": 40.080099001713826,
813
+ "learning_rate": 1.8139534883720925e-07,
814
+ "logits/chosen": -2.6758790016174316,
815
+ "logits/rejected": -2.6650328636169434,
816
+ "logps/chosen": -66.64186096191406,
817
+ "logps/rejected": -78.38505554199219,
818
+ "loss": 0.4696,
819
+ "rewards/accuracies": 0.38749998807907104,
820
+ "rewards/chosen": 3.4873664379119873,
821
+ "rewards/margins": 7.898496150970459,
822
+ "rewards/rejected": -4.411130428314209,
823
+ "step": 490
824
+ },
825
+ {
826
+ "epoch": 1.7421602787456445,
827
+ "grad_norm": 47.95492405495267,
828
+ "learning_rate": 1.7170542635658914e-07,
829
+ "logits/chosen": -2.593526601791382,
830
+ "logits/rejected": -2.5879125595092773,
831
+ "logps/chosen": -68.72013854980469,
832
+ "logps/rejected": -92.43871307373047,
833
+ "loss": 0.371,
834
+ "rewards/accuracies": 0.46875,
835
+ "rewards/chosen": 4.010916709899902,
836
+ "rewards/margins": 8.630084991455078,
837
+ "rewards/rejected": -4.619168281555176,
838
+ "step": 500
839
+ },
840
+ {
841
+ "epoch": 1.7421602787456445,
842
+ "eval_logits/chosen": -2.7067761421203613,
843
+ "eval_logits/rejected": -2.691006898880005,
844
+ "eval_logps/chosen": -73.18505859375,
845
+ "eval_logps/rejected": -82.15363311767578,
846
+ "eval_loss": 0.863908588886261,
847
+ "eval_rewards/accuracies": 0.3392857015132904,
848
+ "eval_rewards/chosen": 0.6922710537910461,
849
+ "eval_rewards/margins": 1.2356635332107544,
850
+ "eval_rewards/rejected": -0.5433923602104187,
851
+ "eval_runtime": 113.7753,
852
+ "eval_samples_per_second": 17.579,
853
+ "eval_steps_per_second": 0.554,
854
+ "step": 500
855
+ },
856
+ {
857
+ "epoch": 1.7770034843205575,
858
+ "grad_norm": 43.89133813372592,
859
+ "learning_rate": 1.6201550387596898e-07,
860
+ "logits/chosen": -2.665417432785034,
861
+ "logits/rejected": -2.647148609161377,
862
+ "logps/chosen": -63.23058319091797,
863
+ "logps/rejected": -77.56340026855469,
864
+ "loss": 0.3821,
865
+ "rewards/accuracies": 0.45625001192092896,
866
+ "rewards/chosen": 3.9886889457702637,
867
+ "rewards/margins": 8.217727661132812,
868
+ "rewards/rejected": -4.229039192199707,
869
+ "step": 510
870
+ },
871
+ {
872
+ "epoch": 1.8118466898954704,
873
+ "grad_norm": 26.582330424497826,
874
+ "learning_rate": 1.523255813953488e-07,
875
+ "logits/chosen": -2.671867847442627,
876
+ "logits/rejected": -2.666949510574341,
877
+ "logps/chosen": -66.01771545410156,
878
+ "logps/rejected": -86.70332336425781,
879
+ "loss": 0.382,
880
+ "rewards/accuracies": 0.46875,
881
+ "rewards/chosen": 4.374355316162109,
882
+ "rewards/margins": 9.404329299926758,
883
+ "rewards/rejected": -5.029973983764648,
884
+ "step": 520
885
+ },
886
+ {
887
+ "epoch": 1.8466898954703832,
888
+ "grad_norm": 16.01608311357976,
889
+ "learning_rate": 1.426356589147287e-07,
890
+ "logits/chosen": -2.6440272331237793,
891
+ "logits/rejected": -2.6383345127105713,
892
+ "logps/chosen": -63.14166259765625,
893
+ "logps/rejected": -80.24067687988281,
894
+ "loss": 0.3916,
895
+ "rewards/accuracies": 0.4625000059604645,
896
+ "rewards/chosen": 4.34307336807251,
897
+ "rewards/margins": 9.047937393188477,
898
+ "rewards/rejected": -4.704863548278809,
899
+ "step": 530
900
+ },
901
+ {
902
+ "epoch": 1.8815331010452963,
903
+ "grad_norm": 503.79684025145957,
904
+ "learning_rate": 1.3294573643410851e-07,
905
+ "logits/chosen": -2.6588096618652344,
906
+ "logits/rejected": -2.6732683181762695,
907
+ "logps/chosen": -53.71875,
908
+ "logps/rejected": -84.29718780517578,
909
+ "loss": 0.46,
910
+ "rewards/accuracies": 0.44999998807907104,
911
+ "rewards/chosen": 5.3063740730285645,
912
+ "rewards/margins": 11.112980842590332,
913
+ "rewards/rejected": -5.806607246398926,
914
+ "step": 540
915
+ },
916
+ {
917
+ "epoch": 1.916376306620209,
918
+ "grad_norm": 1.5289785290366227,
919
+ "learning_rate": 1.2325581395348835e-07,
920
+ "logits/chosen": -2.636369228363037,
921
+ "logits/rejected": -2.6182668209075928,
922
+ "logps/chosen": -79.08727264404297,
923
+ "logps/rejected": -97.51543426513672,
924
+ "loss": 0.3689,
925
+ "rewards/accuracies": 0.5,
926
+ "rewards/chosen": 6.226182460784912,
927
+ "rewards/margins": 11.859712600708008,
928
+ "rewards/rejected": -5.633530139923096,
929
+ "step": 550
930
+ },
931
+ {
932
+ "epoch": 1.951219512195122,
933
+ "grad_norm": 96.76132190493311,
934
+ "learning_rate": 1.1356589147286824e-07,
935
+ "logits/chosen": -2.64457368850708,
936
+ "logits/rejected": -2.651458740234375,
937
+ "logps/chosen": -55.0025520324707,
938
+ "logps/rejected": -76.64137268066406,
939
+ "loss": 0.3985,
940
+ "rewards/accuracies": 0.4437499940395355,
941
+ "rewards/chosen": 4.620251655578613,
942
+ "rewards/margins": 7.983218193054199,
943
+ "rewards/rejected": -3.362966537475586,
944
+ "step": 560
945
+ },
946
+ {
947
+ "epoch": 1.986062717770035,
948
+ "grad_norm": 13.582049528683124,
949
+ "learning_rate": 1.0387596899224806e-07,
950
+ "logits/chosen": -2.7313754558563232,
951
+ "logits/rejected": -2.702322483062744,
952
+ "logps/chosen": -56.71698760986328,
953
+ "logps/rejected": -67.86329650878906,
954
+ "loss": 0.3789,
955
+ "rewards/accuracies": 0.4437499940395355,
956
+ "rewards/chosen": 4.503007411956787,
957
+ "rewards/margins": 7.31237268447876,
958
+ "rewards/rejected": -2.8093647956848145,
959
+ "step": 570
960
+ },
961
+ {
962
+ "epoch": 2.0,
963
+ "step": 574,
964
+ "total_flos": 0.0,
965
+ "train_loss": 0.5567006942287139,
966
+ "train_runtime": 6469.9642,
967
+ "train_samples_per_second": 5.669,
968
+ "train_steps_per_second": 0.089
969
+ }
970
+ ],
971
+ "logging_steps": 10,
972
+ "max_steps": 574,
973
+ "num_input_tokens_seen": 0,
974
+ "num_train_epochs": 2,
975
+ "save_steps": 100,
976
+ "stateful_callbacks": {
977
+ "TrainerControl": {
978
+ "args": {
979
+ "should_epoch_stop": false,
980
+ "should_evaluate": false,
981
+ "should_log": false,
982
+ "should_save": true,
983
+ "should_training_stop": true
984
+ },
985
+ "attributes": {}
986
+ }
987
+ },
988
+ "total_flos": 0.0,
989
+ "train_batch_size": 8,
990
+ "trial_name": null,
991
+ "trial_params": null
992
+ }