wzhouad commited on
Commit
3009226
1 Parent(s): 626ce43

Model save

Browse files
README.md CHANGED
@@ -17,15 +17,15 @@ should probably proofread and complete it, then remove this comment. -->
17
 
18
  This model is a fine-tuned version of [HuggingFaceH4/mistral-7b-sft-beta](https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta) on the None dataset.
19
  It achieves the following results on the evaluation set:
20
- - Loss: 0.4932
21
- - Rewards/chosen: -3.0120
22
- - Rewards/rejected: -4.2654
23
- - Rewards/accuracies: 0.7695
24
- - Rewards/margins: 1.2534
25
- - Logps/rejected: -683.8962
26
- - Logps/chosen: -558.2434
27
- - Logits/rejected: 0.7844
28
- - Logits/chosen: 0.2532
29
 
30
  ## Model description
31
 
@@ -47,7 +47,7 @@ The following hyperparameters were used during training:
47
  - learning_rate: 5e-07
48
  - train_batch_size: 8
49
  - eval_batch_size: 8
50
- - seed: 1
51
  - distributed_type: multi-GPU
52
  - num_devices: 8
53
  - gradient_accumulation_steps: 2
@@ -62,14 +62,10 @@ The following hyperparameters were used during training:
62
 
63
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
64
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
65
- | 0.5391 | 0.11 | 100 | 0.6202 | -0.4534 | -0.7509 | 0.6758 | 0.2975 | -332.4442 | -302.3835 | -2.5452 | -2.5608 |
66
- | 0.4673 | 0.23 | 200 | 0.5535 | -1.1718 | -1.7624 | 0.7539 | 0.5905 | -433.5890 | -374.2225 | -2.1215 | -2.1572 |
67
- | 0.4334 | 0.34 | 300 | 0.5339 | -2.2652 | -3.2391 | 0.7461 | 0.9739 | -581.2654 | -483.5594 | -0.1994 | -0.5677 |
68
- | 0.3964 | 0.45 | 400 | 0.5219 | -2.6343 | -3.7846 | 0.7695 | 1.1503 | -635.8123 | -520.4658 | 0.8270 | 0.2987 |
69
- | 0.408 | 0.57 | 500 | 0.5032 | -2.1788 | -3.2538 | 0.7773 | 1.0751 | -582.7369 | -474.9173 | 0.1579 | -0.3200 |
70
- | 0.3955 | 0.68 | 600 | 0.5006 | -2.6604 | -3.8606 | 0.7539 | 1.2002 | -643.4160 | -523.0820 | 0.9437 | 0.3256 |
71
- | 0.3779 | 0.79 | 700 | 0.4951 | -2.8271 | -4.0892 | 0.7656 | 1.2620 | -666.2689 | -539.7507 | 0.8019 | 0.2515 |
72
- | 0.3845 | 0.91 | 800 | 0.4932 | -3.0120 | -4.2654 | 0.7695 | 1.2534 | -683.8962 | -558.2434 | 0.7844 | 0.2532 |
73
 
74
 
75
  ### Framework versions
 
17
 
18
  This model is a fine-tuned version of [HuggingFaceH4/mistral-7b-sft-beta](https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta) on the None dataset.
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 0.0712
21
+ - Rewards/chosen: -2.3718
22
+ - Rewards/rejected: -2.8225
23
+ - Rewards/accuracies: 0.625
24
+ - Rewards/margins: 0.4507
25
+ - Logps/rejected: -539.6053
26
+ - Logps/chosen: -494.2236
27
+ - Logits/rejected: -2.2822
28
+ - Logits/chosen: -2.3030
29
 
30
  ## Model description
31
 
 
47
  - learning_rate: 5e-07
48
  - train_batch_size: 8
49
  - eval_batch_size: 8
50
+ - seed: 3
51
  - distributed_type: multi-GPU
52
  - num_devices: 8
53
  - gradient_accumulation_steps: 2
 
62
 
63
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
64
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
65
+ | 0.0594 | 0.25 | 100 | 0.1035 | -1.7191 | -1.9450 | 0.6172 | 0.2259 | -451.8574 | -428.9503 | -2.3270 | -2.3408 |
66
+ | 0.0329 | 0.49 | 200 | 0.0693 | -2.4492 | -2.8068 | 0.6094 | 0.3576 | -538.0304 | -501.9568 | -2.2147 | -2.2352 |
67
+ | 0.0312 | 0.74 | 300 | 0.0689 | -2.4412 | -2.8616 | 0.6133 | 0.4204 | -543.5178 | -501.1634 | -2.2721 | -2.2933 |
68
+ | 0.0331 | 0.99 | 400 | 0.0712 | -2.3718 | -2.8225 | 0.625 | 0.4507 | -539.6053 | -494.2236 | -2.2822 | -2.3030 |
 
 
 
 
69
 
70
 
71
  ### Framework versions
all_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
  "epoch": 1.0,
3
- "train_loss": 0.4371140412269065,
4
- "train_runtime": 8003.3982,
5
- "train_samples": 113028,
6
- "train_samples_per_second": 14.123,
7
- "train_steps_per_second": 0.11
8
  }
 
1
  {
2
  "epoch": 1.0,
3
+ "train_loss": 0.0722552685457983,
4
+ "train_runtime": 3732.8792,
5
+ "train_samples": 51894,
6
+ "train_samples_per_second": 13.902,
7
+ "train_steps_per_second": 0.108
8
  }
model-00001-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3391c5b96744d7303ee87811d01d8f2910d44fb430606c4813a529a5d5a69231
3
  size 4943162336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:df5f8551f34bd5fa2c36c62a9e1e02db72d830f8b080213c70f6615d9f81b129
3
  size 4943162336
model-00002-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d99874ac21f1a7d803f6612b2fb6053974a8bb434b21739731879d25c543309f
3
  size 4999819336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec4273703f96f7c3f2cf7aaa5e04be8cea024440c644d9e1a06f6ec8a234f06a
3
  size 4999819336
model-00003-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:26b5b6bca1e7fa79c661c1c327d3a33daa88727d89abea163dd08eac60edba0a
3
  size 4540516344
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9d1a5ffde550f3d000a899abb7c1f554363bee7053e537892516534d9b1b6cf9
3
  size 4540516344
train_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
  "epoch": 1.0,
3
- "train_loss": 0.4371140412269065,
4
- "train_runtime": 8003.3982,
5
- "train_samples": 113028,
6
- "train_samples_per_second": 14.123,
7
- "train_steps_per_second": 0.11
8
  }
 
1
  {
2
  "epoch": 1.0,
3
+ "train_loss": 0.0722552685457983,
4
+ "train_runtime": 3732.8792,
5
+ "train_samples": 51894,
6
+ "train_samples_per_second": 13.902,
7
+ "train_steps_per_second": 0.108
8
  }
trainer_state.json CHANGED
@@ -1,21 +1,21 @@
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
- "epoch": 0.9994340690435767,
5
  "eval_steps": 100,
6
- "global_step": 883,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
  "epoch": 0.0,
13
- "learning_rate": 5.617977528089887e-09,
14
- "logits/chosen": -2.763059616088867,
15
- "logits/rejected": -2.7395401000976562,
16
- "logps/chosen": -322.45367431640625,
17
- "logps/rejected": -273.0731506347656,
18
- "loss": 0.6931,
19
  "rewards/accuracies": 0.0,
20
  "rewards/chosen": 0.0,
21
  "rewards/margins": 0.0,
@@ -23,1377 +23,641 @@
23
  "step": 1
24
  },
25
  {
26
- "epoch": 0.01,
27
- "learning_rate": 5.617977528089887e-08,
28
- "logits/chosen": -2.7941672801971436,
29
- "logits/rejected": -2.771027088165283,
30
- "logps/chosen": -334.48358154296875,
31
- "logps/rejected": -186.61041259765625,
32
- "loss": 0.693,
33
- "rewards/accuracies": 0.4861111044883728,
34
- "rewards/chosen": 1.1024479135812726e-05,
35
- "rewards/margins": 6.540949925692985e-06,
36
- "rewards/rejected": 4.483505108510144e-06,
37
  "step": 10
38
  },
39
  {
40
- "epoch": 0.02,
41
- "learning_rate": 1.1235955056179774e-07,
42
- "logits/chosen": -2.821061134338379,
43
- "logits/rejected": -2.800480842590332,
44
- "logps/chosen": -334.4288635253906,
45
- "logps/rejected": -174.1417999267578,
46
- "loss": 0.6918,
47
- "rewards/accuracies": 0.5625,
48
- "rewards/chosen": 0.0012656518956646323,
49
- "rewards/margins": 0.0027236624155193567,
50
- "rewards/rejected": -0.0014580106362700462,
51
  "step": 20
52
  },
53
  {
54
- "epoch": 0.03,
55
- "learning_rate": 1.6853932584269663e-07,
56
- "logits/chosen": -2.7578957080841064,
57
- "logits/rejected": -2.745757579803467,
58
- "logps/chosen": -318.7159423828125,
59
- "logps/rejected": -187.8983917236328,
60
- "loss": 0.6838,
61
- "rewards/accuracies": 0.6812499761581421,
62
- "rewards/chosen": 0.008920788764953613,
63
- "rewards/margins": 0.017777040600776672,
64
- "rewards/rejected": -0.008856252767145634,
65
  "step": 30
66
  },
67
  {
68
- "epoch": 0.05,
69
- "learning_rate": 2.2471910112359549e-07,
70
- "logits/chosen": -2.7689571380615234,
71
- "logits/rejected": -2.728093385696411,
72
- "logps/chosen": -361.4432678222656,
73
- "logps/rejected": -208.02230834960938,
74
- "loss": 0.6673,
75
- "rewards/accuracies": 0.65625,
76
- "rewards/chosen": 0.024761155247688293,
77
- "rewards/margins": 0.05898100882768631,
78
- "rewards/rejected": -0.034219853579998016,
79
  "step": 40
80
  },
81
  {
82
- "epoch": 0.06,
83
- "learning_rate": 2.8089887640449437e-07,
84
- "logits/chosen": -2.6803011894226074,
85
- "logits/rejected": -2.669233560562134,
86
- "logps/chosen": -289.77728271484375,
87
- "logps/rejected": -173.2029266357422,
88
- "loss": 0.6387,
89
- "rewards/accuracies": 0.699999988079071,
90
- "rewards/chosen": 0.024808162823319435,
91
- "rewards/margins": 0.10643555223941803,
92
- "rewards/rejected": -0.08162739127874374,
93
  "step": 50
94
  },
95
  {
96
- "epoch": 0.07,
97
- "learning_rate": 3.3707865168539325e-07,
98
- "logits/chosen": -2.5601308345794678,
99
- "logits/rejected": -2.552110433578491,
100
- "logps/chosen": -327.4698181152344,
101
- "logps/rejected": -226.25082397460938,
102
- "loss": 0.6222,
103
- "rewards/accuracies": 0.6812499761581421,
104
- "rewards/chosen": -0.010004254989326,
105
- "rewards/margins": 0.1619780957698822,
106
- "rewards/rejected": -0.17198236286640167,
107
  "step": 60
108
  },
109
  {
110
- "epoch": 0.08,
111
- "learning_rate": 3.9325842696629214e-07,
112
- "logits/chosen": -2.5909500122070312,
113
- "logits/rejected": -2.5680363178253174,
114
- "logps/chosen": -337.80718994140625,
115
- "logps/rejected": -248.06088256835938,
116
- "loss": 0.5955,
117
- "rewards/accuracies": 0.6812499761581421,
118
- "rewards/chosen": -0.06871125102043152,
119
- "rewards/margins": 0.2629837989807129,
120
- "rewards/rejected": -0.331695020198822,
121
  "step": 70
122
  },
123
  {
124
- "epoch": 0.09,
125
- "learning_rate": 4.4943820224719097e-07,
126
- "logits/chosen": -2.5084314346313477,
127
- "logits/rejected": -2.5084025859832764,
128
- "logps/chosen": -391.24908447265625,
129
- "logps/rejected": -225.06039428710938,
130
- "loss": 0.572,
131
- "rewards/accuracies": 0.699999988079071,
132
- "rewards/chosen": -0.0807892456650734,
133
- "rewards/margins": 0.4122789800167084,
134
- "rewards/rejected": -0.4930681586265564,
135
  "step": 80
136
  },
137
  {
138
- "epoch": 0.1,
139
- "learning_rate": 4.999980431020109e-07,
140
- "logits/chosen": -2.502631664276123,
141
- "logits/rejected": -2.4763102531433105,
142
- "logps/chosen": -356.6223449707031,
143
- "logps/rejected": -256.1166076660156,
144
- "loss": 0.5511,
145
- "rewards/accuracies": 0.8062499761581421,
146
- "rewards/chosen": -0.06631435453891754,
147
- "rewards/margins": 0.5667875409126282,
148
- "rewards/rejected": -0.6331019401550293,
149
  "step": 90
150
  },
151
  {
152
- "epoch": 0.11,
153
- "learning_rate": 4.997632524101301e-07,
154
- "logits/chosen": -2.535919666290283,
155
- "logits/rejected": -2.499262571334839,
156
- "logps/chosen": -368.29644775390625,
157
- "logps/rejected": -279.8140869140625,
158
- "loss": 0.5391,
159
- "rewards/accuracies": 0.699999988079071,
160
- "rewards/chosen": -0.379818320274353,
161
- "rewards/margins": 0.5510101914405823,
162
- "rewards/rejected": -0.9308284521102905,
163
  "step": 100
164
  },
165
  {
166
- "epoch": 0.11,
167
- "eval_logits/chosen": -2.5607945919036865,
168
- "eval_logits/rejected": -2.5451953411102295,
169
- "eval_logps/chosen": -302.3835144042969,
170
- "eval_logps/rejected": -332.4442443847656,
171
- "eval_loss": 0.6201537251472473,
172
- "eval_rewards/accuracies": 0.67578125,
173
- "eval_rewards/chosen": -0.4534388780593872,
174
- "eval_rewards/margins": 0.29747113585472107,
175
- "eval_rewards/rejected": -0.7509099245071411,
176
- "eval_runtime": 53.0727,
177
- "eval_samples_per_second": 37.684,
178
- "eval_steps_per_second": 0.603,
179
  "step": 100
180
  },
181
  {
182
- "epoch": 0.12,
183
- "learning_rate": 4.991375032514749e-07,
184
- "logits/chosen": -2.511805772781372,
185
- "logits/rejected": -2.4707462787628174,
186
- "logps/chosen": -340.9619140625,
187
- "logps/rejected": -279.88531494140625,
188
- "loss": 0.5227,
189
- "rewards/accuracies": 0.706250011920929,
190
- "rewards/chosen": -0.5024222135543823,
191
- "rewards/margins": 0.5474096536636353,
192
- "rewards/rejected": -1.0498319864273071,
193
  "step": 110
194
  },
195
  {
196
- "epoch": 0.14,
197
- "learning_rate": 4.98121775121344e-07,
198
- "logits/chosen": -2.3996121883392334,
199
- "logits/rejected": -2.360802173614502,
200
- "logps/chosen": -405.9101257324219,
201
- "logps/rejected": -361.8677978515625,
202
- "loss": 0.4868,
203
- "rewards/accuracies": 0.768750011920929,
204
- "rewards/chosen": -0.7274013757705688,
205
- "rewards/margins": 0.7646517157554626,
206
- "rewards/rejected": -1.4920530319213867,
207
  "step": 120
208
  },
209
  {
210
- "epoch": 0.15,
211
- "learning_rate": 4.96717657955441e-07,
212
- "logits/chosen": -2.341158866882324,
213
- "logits/rejected": -2.274775266647339,
214
- "logps/chosen": -437.23492431640625,
215
- "logps/rejected": -369.3250732421875,
216
- "loss": 0.4645,
217
- "rewards/accuracies": 0.706250011920929,
218
- "rewards/chosen": -0.8677096366882324,
219
- "rewards/margins": 0.8753921389579773,
220
- "rewards/rejected": -1.743101716041565,
221
  "step": 130
222
  },
223
  {
224
- "epoch": 0.16,
225
- "learning_rate": 4.949273496411216e-07,
226
- "logits/chosen": -2.2799994945526123,
227
- "logits/rejected": -2.2343602180480957,
228
- "logps/chosen": -412.9886779785156,
229
- "logps/rejected": -377.98553466796875,
230
- "loss": 0.4708,
231
- "rewards/accuracies": 0.78125,
232
- "rewards/chosen": -0.910749077796936,
233
- "rewards/margins": 0.953301727771759,
234
- "rewards/rejected": -1.8640508651733398,
235
  "step": 140
236
  },
237
  {
238
- "epoch": 0.17,
239
- "learning_rate": 4.927536525770046e-07,
240
- "logits/chosen": -2.1459238529205322,
241
- "logits/rejected": -2.061415910720825,
242
- "logps/chosen": -435.0037536621094,
243
- "logps/rejected": -397.0843505859375,
244
- "loss": 0.473,
245
- "rewards/accuracies": 0.78125,
246
- "rewards/chosen": -1.050859808921814,
247
- "rewards/margins": 0.9770351648330688,
248
- "rewards/rejected": -2.027894973754883,
249
  "step": 150
250
  },
251
  {
252
- "epoch": 0.18,
253
- "learning_rate": 4.901999692863326e-07,
254
- "logits/chosen": -2.124925374984741,
255
- "logits/rejected": -2.0580692291259766,
256
- "logps/chosen": -444.2850036621094,
257
- "logps/rejected": -368.23089599609375,
258
- "loss": 0.4469,
259
- "rewards/accuracies": 0.7562500238418579,
260
- "rewards/chosen": -0.8980700373649597,
261
- "rewards/margins": 0.9867424964904785,
262
- "rewards/rejected": -1.884812593460083,
263
  "step": 160
264
  },
265
  {
266
- "epoch": 0.19,
267
- "learning_rate": 4.872702970909464e-07,
268
- "logits/chosen": -2.2076802253723145,
269
- "logits/rejected": -2.0857512950897217,
270
- "logps/chosen": -454.78509521484375,
271
- "logps/rejected": -420.0375061035156,
272
- "loss": 0.4679,
273
- "rewards/accuracies": 0.7875000238418579,
274
- "rewards/chosen": -0.9711505770683289,
275
- "rewards/margins": 1.3920787572860718,
276
- "rewards/rejected": -2.363229274749756,
277
  "step": 170
278
  },
279
  {
280
- "epoch": 0.2,
281
- "learning_rate": 4.839692218542131e-07,
282
- "logits/chosen": -2.1690759658813477,
283
- "logits/rejected": -2.098189353942871,
284
- "logps/chosen": -408.5738830566406,
285
- "logps/rejected": -394.4034729003906,
286
- "loss": 0.4665,
287
- "rewards/accuracies": 0.8062499761581421,
288
- "rewards/chosen": -0.8100764155387878,
289
- "rewards/margins": 1.1612640619277954,
290
- "rewards/rejected": -1.9713407754898071,
291
  "step": 180
292
  },
293
  {
294
- "epoch": 0.22,
295
- "learning_rate": 4.803019108026997e-07,
296
- "logits/chosen": -2.1986196041107178,
297
- "logits/rejected": -2.112959861755371,
298
- "logps/chosen": -450.8468322753906,
299
- "logps/rejected": -383.9068603515625,
300
- "loss": 0.477,
301
- "rewards/accuracies": 0.831250011920929,
302
- "rewards/chosen": -0.6421102285385132,
303
- "rewards/margins": 1.2314198017120361,
304
- "rewards/rejected": -1.8735300302505493,
305
  "step": 190
306
  },
307
  {
308
- "epoch": 0.23,
309
- "learning_rate": 4.7627410443782887e-07,
310
- "logits/chosen": -2.1900177001953125,
311
- "logits/rejected": -2.099222183227539,
312
- "logps/chosen": -391.14959716796875,
313
- "logps/rejected": -341.2007751464844,
314
- "loss": 0.4673,
315
- "rewards/accuracies": 0.7562500238418579,
316
- "rewards/chosen": -0.6667279005050659,
317
- "rewards/margins": 0.9072163701057434,
318
- "rewards/rejected": -1.573944330215454,
319
  "step": 200
320
  },
321
  {
322
- "epoch": 0.23,
323
- "eval_logits/chosen": -2.1572117805480957,
324
- "eval_logits/rejected": -2.121535301208496,
325
- "eval_logps/chosen": -374.22247314453125,
326
- "eval_logps/rejected": -433.5889587402344,
327
- "eval_loss": 0.5535483360290527,
328
- "eval_rewards/accuracies": 0.75390625,
329
- "eval_rewards/chosen": -1.171828269958496,
330
- "eval_rewards/margins": 0.5905283689498901,
331
- "eval_rewards/rejected": -1.7623566389083862,
332
- "eval_runtime": 52.9266,
333
- "eval_samples_per_second": 37.788,
334
- "eval_steps_per_second": 0.605,
335
  "step": 200
336
  },
337
  {
338
- "epoch": 0.24,
339
- "learning_rate": 4.7189210755018034e-07,
340
- "logits/chosen": -2.11592435836792,
341
- "logits/rejected": -2.0587477684020996,
342
- "logps/chosen": -453.43475341796875,
343
- "logps/rejected": -418.6344299316406,
344
- "loss": 0.4362,
345
- "rewards/accuracies": 0.800000011920929,
346
- "rewards/chosen": -0.9801640510559082,
347
- "rewards/margins": 1.1953608989715576,
348
- "rewards/rejected": -2.175525188446045,
349
  "step": 210
350
  },
351
  {
352
- "epoch": 0.25,
353
- "learning_rate": 4.671627793504988e-07,
354
- "logits/chosen": -2.0989413261413574,
355
- "logits/rejected": -2.0104451179504395,
356
- "logps/chosen": -431.24774169921875,
357
- "logps/rejected": -397.3530578613281,
358
- "loss": 0.4357,
359
- "rewards/accuracies": 0.78125,
360
- "rewards/chosen": -0.937910258769989,
361
- "rewards/margins": 1.2375946044921875,
362
- "rewards/rejected": -2.1755049228668213,
363
  "step": 220
364
  },
365
  {
366
- "epoch": 0.26,
367
- "learning_rate": 4.6209352273286095e-07,
368
- "logits/chosen": -1.967944860458374,
369
- "logits/rejected": -1.88350510597229,
370
- "logps/chosen": -476.22149658203125,
371
- "logps/rejected": -449.72564697265625,
372
- "loss": 0.4319,
373
- "rewards/accuracies": 0.7562500238418579,
374
- "rewards/chosen": -1.2830114364624023,
375
- "rewards/margins": 1.1820969581604004,
376
- "rewards/rejected": -2.465108633041382,
377
  "step": 230
378
  },
379
  {
380
- "epoch": 0.27,
381
- "learning_rate": 4.56692272686805e-07,
382
- "logits/chosen": -1.943704605102539,
383
- "logits/rejected": -1.8378359079360962,
384
- "logps/chosen": -490.98834228515625,
385
- "logps/rejected": -473.5677795410156,
386
- "loss": 0.4459,
387
- "rewards/accuracies": 0.800000011920929,
388
- "rewards/chosen": -1.4955508708953857,
389
- "rewards/margins": 1.398730993270874,
390
- "rewards/rejected": -2.8942818641662598,
391
  "step": 240
392
  },
393
  {
394
- "epoch": 0.28,
395
- "learning_rate": 4.5096748387656326e-07,
396
- "logits/chosen": -1.9319934844970703,
397
- "logits/rejected": -1.848971962928772,
398
- "logps/chosen": -431.0406188964844,
399
- "logps/rejected": -432.92791748046875,
400
- "loss": 0.4259,
401
- "rewards/accuracies": 0.75,
402
- "rewards/chosen": -1.2432794570922852,
403
- "rewards/margins": 1.096217393875122,
404
- "rewards/rejected": -2.3394968509674072,
405
  "step": 250
406
  },
407
  {
408
- "epoch": 0.29,
409
- "learning_rate": 4.4492811740683877e-07,
410
- "logits/chosen": -2.0316851139068604,
411
- "logits/rejected": -1.8930383920669556,
412
- "logps/chosen": -479.8583984375,
413
- "logps/rejected": -440.8379821777344,
414
- "loss": 0.4478,
415
- "rewards/accuracies": 0.8062499761581421,
416
- "rewards/chosen": -1.304606318473816,
417
- "rewards/margins": 1.1949961185455322,
418
- "rewards/rejected": -2.4996025562286377,
419
  "step": 260
420
  },
421
  {
422
- "epoch": 0.31,
423
- "learning_rate": 4.3858362679584354e-07,
424
- "logits/chosen": -1.6744228601455688,
425
- "logits/rejected": -1.3835828304290771,
426
- "logps/chosen": -499.75677490234375,
427
- "logps/rejected": -487.6814880371094,
428
- "loss": 0.4322,
429
- "rewards/accuracies": 0.768750011920929,
430
- "rewards/chosen": -1.8357969522476196,
431
- "rewards/margins": 1.239199161529541,
432
- "rewards/rejected": -3.0749964714050293,
433
  "step": 270
434
  },
435
  {
436
- "epoch": 0.32,
437
- "learning_rate": 4.3194394317755245e-07,
438
- "logits/chosen": -1.2927743196487427,
439
- "logits/rejected": -1.0211999416351318,
440
- "logps/chosen": -465.1715393066406,
441
- "logps/rejected": -479.52764892578125,
442
- "loss": 0.4306,
443
- "rewards/accuracies": 0.78125,
444
- "rewards/chosen": -1.67030930519104,
445
- "rewards/margins": 1.3314597606658936,
446
- "rewards/rejected": -3.0017685890197754,
447
  "step": 280
448
  },
449
  {
450
- "epoch": 0.33,
451
- "learning_rate": 4.2501945975633914e-07,
452
- "logits/chosen": -1.036027431488037,
453
- "logits/rejected": -0.36450880765914917,
454
- "logps/chosen": -535.3629760742188,
455
- "logps/rejected": -495.39324951171875,
456
- "loss": 0.4274,
457
- "rewards/accuracies": 0.78125,
458
- "rewards/chosen": -1.837786316871643,
459
- "rewards/margins": 1.4202905893325806,
460
- "rewards/rejected": -3.2580769062042236,
461
  "step": 290
462
  },
463
  {
464
- "epoch": 0.34,
465
- "learning_rate": 4.1782101553832405e-07,
466
- "logits/chosen": -0.958489716053009,
467
- "logits/rejected": -0.3055742084980011,
468
- "logps/chosen": -548.3148193359375,
469
- "logps/rejected": -564.0227661132812,
470
- "loss": 0.4334,
471
- "rewards/accuracies": 0.793749988079071,
472
- "rewards/chosen": -1.9176700115203857,
473
- "rewards/margins": 1.585371732711792,
474
- "rewards/rejected": -3.5030417442321777,
475
  "step": 300
476
  },
477
  {
478
- "epoch": 0.34,
479
- "eval_logits/chosen": -0.5676769018173218,
480
- "eval_logits/rejected": -0.19943884015083313,
481
- "eval_logps/chosen": -483.5593566894531,
482
- "eval_logps/rejected": -581.2654418945312,
483
- "eval_loss": 0.533881425857544,
484
- "eval_rewards/accuracies": 0.74609375,
485
- "eval_rewards/chosen": -2.2651968002319336,
486
- "eval_rewards/margins": 0.9739242196083069,
487
- "eval_rewards/rejected": -3.239121437072754,
488
- "eval_runtime": 52.9469,
489
- "eval_samples_per_second": 37.774,
490
- "eval_steps_per_second": 0.604,
491
  "step": 300
492
  },
493
  {
494
- "epoch": 0.35,
495
- "learning_rate": 4.103598783649029e-07,
496
- "logits/chosen": -0.8055219650268555,
497
- "logits/rejected": -0.09858167171478271,
498
- "logps/chosen": -491.8812561035156,
499
- "logps/rejected": -500.9165954589844,
500
- "loss": 0.4204,
501
- "rewards/accuracies": 0.824999988079071,
502
- "rewards/chosen": -1.5920034646987915,
503
- "rewards/margins": 1.5368704795837402,
504
- "rewards/rejected": -3.1288740634918213,
505
  "step": 310
506
  },
507
  {
508
- "epoch": 0.36,
509
- "learning_rate": 4.026477272750119e-07,
510
- "logits/chosen": -0.9476078748703003,
511
- "logits/rejected": -0.4249343276023865,
512
- "logps/chosen": -513.9217529296875,
513
- "logps/rejected": -506.89044189453125,
514
- "loss": 0.4119,
515
- "rewards/accuracies": 0.824999988079071,
516
- "rewards/chosen": -1.5674420595169067,
517
- "rewards/margins": 1.4193658828735352,
518
- "rewards/rejected": -2.9868078231811523,
519
  "step": 320
520
  },
521
  {
522
- "epoch": 0.37,
523
- "learning_rate": 3.9469663422373864e-07,
524
- "logits/chosen": -0.6462847590446472,
525
- "logits/rejected": 0.0693933516740799,
526
- "logps/chosen": -495.73223876953125,
527
- "logps/rejected": -514.7813720703125,
528
- "loss": 0.4045,
529
- "rewards/accuracies": 0.78125,
530
- "rewards/chosen": -1.974498987197876,
531
- "rewards/margins": 1.417799472808838,
532
- "rewards/rejected": -3.392298936843872,
533
  "step": 330
534
  },
535
  {
536
- "epoch": 0.38,
537
- "learning_rate": 3.865190451858954e-07,
538
- "logits/chosen": -0.5255932211875916,
539
- "logits/rejected": 0.2057991325855255,
540
- "logps/chosen": -571.3324584960938,
541
- "logps/rejected": -582.94384765625,
542
- "loss": 0.4021,
543
  "rewards/accuracies": 0.800000011920929,
544
- "rewards/chosen": -2.0892231464385986,
545
- "rewards/margins": 1.7780288457870483,
546
- "rewards/rejected": -3.8672518730163574,
547
  "step": 340
548
  },
549
  {
550
- "epoch": 0.4,
551
- "learning_rate": 3.781277606741327e-07,
552
- "logits/chosen": -0.9264996647834778,
553
- "logits/rejected": -0.43614667654037476,
554
- "logps/chosen": -430.67626953125,
555
- "logps/rejected": -460.7449645996094,
556
- "loss": 0.4308,
557
- "rewards/accuracies": 0.7562500238418579,
558
- "rewards/chosen": -1.4353458881378174,
559
- "rewards/margins": 1.2627815008163452,
560
- "rewards/rejected": -2.698127269744873,
561
  "step": 350
562
  },
563
  {
564
- "epoch": 0.41,
565
- "learning_rate": 3.6953591570208996e-07,
566
- "logits/chosen": -0.40989646315574646,
567
- "logits/rejected": 0.31492868065834045,
568
- "logps/chosen": -503.3641662597656,
569
- "logps/rejected": -493.41302490234375,
570
- "loss": 0.4142,
571
- "rewards/accuracies": 0.793749988079071,
572
- "rewards/chosen": -1.548743486404419,
573
- "rewards/margins": 1.5001083612442017,
574
- "rewards/rejected": -3.048851728439331,
575
  "step": 360
576
  },
577
  {
578
- "epoch": 0.42,
579
- "learning_rate": 3.607569592239452e-07,
580
- "logits/chosen": -0.1845168173313141,
581
- "logits/rejected": 0.6697748899459839,
582
- "logps/chosen": -489.48187255859375,
583
- "logps/rejected": -486.579345703125,
584
- "loss": 0.4112,
585
- "rewards/accuracies": 0.800000011920929,
586
- "rewards/chosen": -1.562912940979004,
587
- "rewards/margins": 1.469349980354309,
588
- "rewards/rejected": -3.0322628021240234,
589
  "step": 370
590
  },
591
  {
592
- "epoch": 0.43,
593
- "learning_rate": 3.518046330825494e-07,
594
- "logits/chosen": -0.2910882830619812,
595
- "logits/rejected": 0.2493607997894287,
596
- "logps/chosen": -502.5550231933594,
597
- "logps/rejected": -531.24169921875,
598
- "loss": 0.4086,
599
- "rewards/accuracies": 0.7875000238418579,
600
- "rewards/chosen": -1.6785764694213867,
601
- "rewards/margins": 1.3167986869812012,
602
- "rewards/rejected": -2.995375156402588,
603
  "step": 380
604
  },
605
  {
606
- "epoch": 0.44,
607
- "learning_rate": 3.4269295049909713e-07,
608
- "logits/chosen": 0.040309689939022064,
609
- "logits/rejected": 0.8773566484451294,
610
- "logps/chosen": -483.1021423339844,
611
- "logps/rejected": -523.5382080078125,
612
- "loss": 0.4067,
613
- "rewards/accuracies": 0.7749999761581421,
614
- "rewards/chosen": -1.81940495967865,
615
- "rewards/margins": 1.4445574283599854,
616
- "rewards/rejected": -3.2639622688293457,
617
  "step": 390
618
  },
619
  {
620
- "epoch": 0.45,
621
- "learning_rate": 3.3343617413800453e-07,
622
- "logits/chosen": 0.04438358172774315,
623
- "logits/rejected": 0.899645984172821,
624
- "logps/chosen": -558.7957153320312,
625
- "logps/rejected": -540.1268920898438,
626
- "loss": 0.3964,
627
- "rewards/accuracies": 0.7875000238418579,
628
- "rewards/chosen": -1.9898704290390015,
629
- "rewards/margins": 1.5580358505249023,
630
- "rewards/rejected": -3.5479063987731934,
631
  "step": 400
632
  },
633
  {
634
- "epoch": 0.45,
635
- "eval_logits/chosen": 0.29868921637535095,
636
- "eval_logits/rejected": 0.8269697427749634,
637
- "eval_logps/chosen": -520.4657592773438,
638
- "eval_logps/rejected": -635.8123168945312,
639
- "eval_loss": 0.521929144859314,
640
- "eval_rewards/accuracies": 0.76953125,
641
- "eval_rewards/chosen": -2.634261131286621,
642
- "eval_rewards/margins": 1.1503297090530396,
643
- "eval_rewards/rejected": -3.784590721130371,
644
- "eval_runtime": 52.8903,
645
- "eval_samples_per_second": 37.814,
646
- "eval_steps_per_second": 0.605,
647
  "step": 400
648
  },
649
- {
650
- "epoch": 0.46,
651
- "learning_rate": 3.2404879378132893e-07,
652
- "logits/chosen": -0.08447281271219254,
653
- "logits/rejected": 0.5970763564109802,
654
- "logps/chosen": -498.86090087890625,
655
- "logps/rejected": -592.9935302734375,
656
- "loss": 0.4032,
657
- "rewards/accuracies": 0.84375,
658
- "rewards/chosen": -2.0137200355529785,
659
- "rewards/margins": 1.7450697422027588,
660
- "rewards/rejected": -3.758789539337158,
661
- "step": 410
662
- },
663
- {
664
- "epoch": 0.48,
665
- "learning_rate": 3.1454550364767894e-07,
666
- "logits/chosen": -0.49112820625305176,
667
- "logits/rejected": 0.10465432703495026,
668
- "logps/chosen": -496.7327575683594,
669
- "logps/rejected": -490.85614013671875,
670
- "loss": 0.4004,
671
- "rewards/accuracies": 0.856249988079071,
672
- "rewards/chosen": -1.4156886339187622,
673
- "rewards/margins": 1.7538082599639893,
674
- "rewards/rejected": -3.169497013092041,
675
- "step": 420
676
- },
677
- {
678
- "epoch": 0.49,
679
- "learning_rate": 3.049411793911154e-07,
680
- "logits/chosen": -0.4658167362213135,
681
- "logits/rejected": 0.0734957754611969,
682
- "logps/chosen": -545.6218872070312,
683
- "logps/rejected": -557.7208862304688,
684
- "loss": 0.4061,
685
- "rewards/accuracies": 0.8187500238418579,
686
- "rewards/chosen": -1.908559799194336,
687
- "rewards/margins": 1.4848883152008057,
688
- "rewards/rejected": -3.3934478759765625,
689
- "step": 430
690
- },
691
- {
692
- "epoch": 0.5,
693
- "learning_rate": 2.9525085481604914e-07,
694
- "logits/chosen": -0.30549541115760803,
695
- "logits/rejected": 0.43061351776123047,
696
- "logps/chosen": -545.5732421875,
697
- "logps/rejected": -576.8668212890625,
698
- "loss": 0.4013,
699
- "rewards/accuracies": 0.887499988079071,
700
- "rewards/chosen": -1.9651981592178345,
701
- "rewards/margins": 1.824914574623108,
702
- "rewards/rejected": -3.7901129722595215,
703
- "step": 440
704
- },
705
- {
706
- "epoch": 0.51,
707
- "learning_rate": 2.854896983445833e-07,
708
- "logits/chosen": -0.31081053614616394,
709
- "logits/rejected": 0.8431285619735718,
710
- "logps/chosen": -550.8968505859375,
711
- "logps/rejected": -542.3270263671875,
712
- "loss": 0.4075,
713
- "rewards/accuracies": 0.8125,
714
- "rewards/chosen": -1.837031364440918,
715
- "rewards/margins": 1.7019197940826416,
716
- "rewards/rejected": -3.5389511585235596,
717
- "step": 450
718
- },
719
- {
720
- "epoch": 0.52,
721
- "learning_rate": 2.7567298927313654e-07,
722
- "logits/chosen": -0.2802823781967163,
723
- "logits/rejected": 0.6037198901176453,
724
- "logps/chosen": -514.4429931640625,
725
- "logps/rejected": -506.43316650390625,
726
- "loss": 0.4124,
727
- "rewards/accuracies": 0.8125,
728
- "rewards/chosen": -1.8379509449005127,
729
- "rewards/margins": 1.5676665306091309,
730
- "rewards/rejected": -3.4056172370910645,
731
- "step": 460
732
- },
733
- {
734
- "epoch": 0.53,
735
- "learning_rate": 2.658160938555123e-07,
736
- "logits/chosen": -0.529831051826477,
737
- "logits/rejected": -0.08401882648468018,
738
- "logps/chosen": -504.21368408203125,
739
- "logps/rejected": -535.26904296875,
740
- "loss": 0.4155,
741
- "rewards/accuracies": 0.800000011920929,
742
- "rewards/chosen": -1.7486238479614258,
743
- "rewards/margins": 1.596998691558838,
744
- "rewards/rejected": -3.345623016357422,
745
- "step": 470
746
- },
747
- {
748
- "epoch": 0.54,
749
- "learning_rate": 2.559344412498532e-07,
750
- "logits/chosen": -0.7518737316131592,
751
- "logits/rejected": -0.24171645939350128,
752
- "logps/chosen": -513.7047119140625,
753
- "logps/rejected": -491.68536376953125,
754
- "loss": 0.3966,
755
- "rewards/accuracies": 0.8187500238418579,
756
- "rewards/chosen": -1.6572511196136475,
757
- "rewards/margins": 1.5054905414581299,
758
- "rewards/rejected": -3.1627418994903564,
759
- "step": 480
760
- },
761
- {
762
- "epoch": 0.55,
763
- "learning_rate": 2.460434993671294e-07,
764
- "logits/chosen": -0.6732519865036011,
765
- "logits/rejected": -0.06187018007040024,
766
- "logps/chosen": -510.64093017578125,
767
- "logps/rejected": -511.0845642089844,
768
- "loss": 0.4376,
769
- "rewards/accuracies": 0.7749999761581421,
770
- "rewards/chosen": -1.9079630374908447,
771
- "rewards/margins": 1.5278714895248413,
772
- "rewards/rejected": -3.4358341693878174,
773
- "step": 490
774
- },
775
- {
776
- "epoch": 0.57,
777
- "learning_rate": 2.361587506589672e-07,
778
- "logits/chosen": -0.6487066745758057,
779
- "logits/rejected": -0.08703817427158356,
780
- "logps/chosen": -523.2932739257812,
781
- "logps/rejected": -509.58917236328125,
782
- "loss": 0.408,
783
- "rewards/accuracies": 0.768750011920929,
784
- "rewards/chosen": -1.6383225917816162,
785
- "rewards/margins": 1.5625841617584229,
786
- "rewards/rejected": -3.2009072303771973,
787
- "step": 500
788
- },
789
- {
790
- "epoch": 0.57,
791
- "eval_logits/chosen": -0.3200441896915436,
792
- "eval_logits/rejected": 0.15786468982696533,
793
- "eval_logps/chosen": -474.9172668457031,
794
- "eval_logps/rejected": -582.7368774414062,
795
- "eval_loss": 0.5031983256340027,
796
- "eval_rewards/accuracies": 0.77734375,
797
- "eval_rewards/chosen": -2.1787757873535156,
798
- "eval_rewards/margins": 1.0750598907470703,
799
- "eval_rewards/rejected": -3.253835916519165,
800
- "eval_runtime": 52.9576,
801
- "eval_samples_per_second": 37.766,
802
- "eval_steps_per_second": 0.604,
803
- "step": 500
804
- },
805
- {
806
- "epoch": 0.58,
807
- "learning_rate": 2.2629566788271613e-07,
808
- "logits/chosen": -0.20834016799926758,
809
- "logits/rejected": 0.5500332713127136,
810
- "logps/chosen": -518.3438720703125,
811
- "logps/rejected": -521.8748168945312,
812
- "loss": 0.379,
813
- "rewards/accuracies": 0.793749988079071,
814
- "rewards/chosen": -1.8442401885986328,
815
- "rewards/margins": 1.6085563898086548,
816
- "rewards/rejected": -3.452796459197998,
817
- "step": 510
818
- },
819
- {
820
- "epoch": 0.59,
821
- "learning_rate": 2.1646968988169135e-07,
822
- "logits/chosen": 0.15665414929389954,
823
- "logits/rejected": 0.9071012735366821,
824
- "logps/chosen": -497.3817443847656,
825
- "logps/rejected": -510.22625732421875,
826
- "loss": 0.4174,
827
- "rewards/accuracies": 0.8374999761581421,
828
- "rewards/chosen": -1.7797693014144897,
829
- "rewards/margins": 1.6646064519882202,
830
- "rewards/rejected": -3.444375514984131,
831
- "step": 520
832
- },
833
- {
834
- "epoch": 0.6,
835
- "learning_rate": 2.0669619741850232e-07,
836
- "logits/chosen": -0.15719492733478546,
837
- "logits/rejected": 0.7549746632575989,
838
- "logps/chosen": -552.4660034179688,
839
- "logps/rejected": -554.6041259765625,
840
- "loss": 0.4142,
841
- "rewards/accuracies": 0.8187500238418579,
842
- "rewards/chosen": -1.8783395290374756,
843
- "rewards/margins": 1.8778337240219116,
844
- "rewards/rejected": -3.7561733722686768,
845
- "step": 530
846
- },
847
- {
848
- "epoch": 0.61,
849
- "learning_rate": 1.9699048909929518e-07,
850
- "logits/chosen": -0.36763715744018555,
851
- "logits/rejected": 0.22473928332328796,
852
- "logps/chosen": -512.9124755859375,
853
- "logps/rejected": -511.940185546875,
854
- "loss": 0.4117,
855
- "rewards/accuracies": 0.78125,
856
- "rewards/chosen": -1.8377758264541626,
857
- "rewards/margins": 1.3640468120574951,
858
- "rewards/rejected": -3.201822280883789,
859
- "step": 540
860
- },
861
- {
862
- "epoch": 0.62,
863
- "learning_rate": 1.8736775742659732e-07,
864
- "logits/chosen": -0.5843815803527832,
865
- "logits/rejected": 0.3155004680156708,
866
- "logps/chosen": -550.1474609375,
867
- "logps/rejected": -476.39599609375,
868
- "loss": 0.4227,
869
- "rewards/accuracies": 0.8125,
870
- "rewards/chosen": -1.5097029209136963,
871
- "rewards/margins": 1.6035629510879517,
872
- "rewards/rejected": -3.1132657527923584,
873
- "step": 550
874
- },
875
- {
876
- "epoch": 0.63,
877
- "learning_rate": 1.7784306501824616e-07,
878
- "logits/chosen": -0.36525958776474,
879
- "logits/rejected": 0.3845716118812561,
880
- "logps/chosen": -492.0074157714844,
881
- "logps/rejected": -488.03973388671875,
882
- "loss": 0.3957,
883
- "rewards/accuracies": 0.768750011920929,
884
- "rewards/chosen": -1.8473739624023438,
885
- "rewards/margins": 1.4613382816314697,
886
- "rewards/rejected": -3.3087124824523926,
887
- "step": 560
888
- },
889
- {
890
- "epoch": 0.65,
891
- "learning_rate": 1.6843132102963025e-07,
892
- "logits/chosen": -0.15429559350013733,
893
- "logits/rejected": 0.795578122138977,
894
- "logps/chosen": -539.8087158203125,
895
- "logps/rejected": -548.82080078125,
896
- "loss": 0.39,
897
- "rewards/accuracies": 0.800000011920929,
898
- "rewards/chosen": -1.8266279697418213,
899
- "rewards/margins": 1.7545779943466187,
900
- "rewards/rejected": -3.5812058448791504,
901
- "step": 570
902
- },
903
- {
904
- "epoch": 0.66,
905
- "learning_rate": 1.591472578161458e-07,
906
- "logits/chosen": 0.2290324717760086,
907
- "logits/rejected": 1.0089889764785767,
908
- "logps/chosen": -541.7909545898438,
909
- "logps/rejected": -607.4434814453125,
910
- "loss": 0.3965,
911
- "rewards/accuracies": 0.800000011920929,
912
- "rewards/chosen": -2.054337978363037,
913
- "rewards/margins": 1.6381967067718506,
914
- "rewards/rejected": -3.6925346851348877,
915
- "step": 580
916
- },
917
- {
918
- "epoch": 0.67,
919
- "learning_rate": 1.5000540787240274e-07,
920
- "logits/chosen": 0.2819755971431732,
921
- "logits/rejected": 1.1927533149719238,
922
- "logps/chosen": -555.900634765625,
923
- "logps/rejected": -548.664794921875,
924
- "loss": 0.3743,
925
- "rewards/accuracies": 0.7749999761581421,
926
- "rewards/chosen": -2.2302324771881104,
927
- "rewards/margins": 1.4978115558624268,
928
- "rewards/rejected": -3.728043794631958,
929
- "step": 590
930
- },
931
- {
932
- "epoch": 0.68,
933
- "learning_rate": 1.410200810842749e-07,
934
- "logits/chosen": 0.04086022078990936,
935
- "logits/rejected": 1.0483381748199463,
936
- "logps/chosen": -596.6246337890625,
937
- "logps/rejected": -592.1336669921875,
938
- "loss": 0.3955,
939
- "rewards/accuracies": 0.793749988079071,
940
- "rewards/chosen": -2.0544655323028564,
941
- "rewards/margins": 1.7540998458862305,
942
- "rewards/rejected": -3.808565616607666,
943
- "step": 600
944
- },
945
- {
946
- "epoch": 0.68,
947
- "eval_logits/chosen": 0.3256094753742218,
948
- "eval_logits/rejected": 0.9437094330787659,
949
- "eval_logps/chosen": -523.08203125,
950
- "eval_logps/rejected": -643.4159545898438,
951
- "eval_loss": 0.5006277561187744,
952
- "eval_rewards/accuracies": 0.75390625,
953
- "eval_rewards/chosen": -2.660423994064331,
954
- "eval_rewards/margins": 1.200202465057373,
955
- "eval_rewards/rejected": -3.860626220703125,
956
- "eval_runtime": 52.9758,
957
- "eval_samples_per_second": 37.753,
958
- "eval_steps_per_second": 0.604,
959
- "step": 600
960
- },
961
- {
962
- "epoch": 0.69,
963
- "learning_rate": 1.322053423294041e-07,
964
- "logits/chosen": -0.05821552872657776,
965
- "logits/rejected": 0.6667032837867737,
966
- "logps/chosen": -546.7453002929688,
967
- "logps/rejected": -559.441162109375,
968
- "loss": 0.3896,
969
- "rewards/accuracies": 0.7875000238418579,
970
- "rewards/chosen": -2.0775649547576904,
971
- "rewards/margins": 1.4067105054855347,
972
- "rewards/rejected": -3.4842753410339355,
973
- "step": 610
974
- },
975
- {
976
- "epoch": 0.7,
977
- "learning_rate": 1.2357498946121905e-07,
978
- "logits/chosen": -0.10344459116458893,
979
- "logits/rejected": 1.0869953632354736,
980
- "logps/chosen": -550.6931762695312,
981
- "logps/rejected": -573.5672607421875,
982
- "loss": 0.3981,
983
- "rewards/accuracies": 0.8500000238418579,
984
- "rewards/chosen": -1.9934539794921875,
985
- "rewards/margins": 1.9642198085784912,
986
- "rewards/rejected": -3.9576735496520996,
987
- "step": 620
988
- },
989
- {
990
- "epoch": 0.71,
991
- "learning_rate": 1.1514253171093161e-07,
992
- "logits/chosen": -0.17007485032081604,
993
- "logits/rejected": 0.8311668634414673,
994
- "logps/chosen": -552.5477294921875,
995
- "logps/rejected": -532.693359375,
996
- "loss": 0.3833,
997
- "rewards/accuracies": 0.800000011920929,
998
- "rewards/chosen": -1.8977361917495728,
999
- "rewards/margins": 1.6777369976043701,
1000
- "rewards/rejected": -3.5754730701446533,
1001
- "step": 630
1002
- },
1003
- {
1004
- "epoch": 0.72,
1005
- "learning_rate": 1.0692116854131883e-07,
1006
- "logits/chosen": -0.25620418787002563,
1007
- "logits/rejected": 0.6494542956352234,
1008
- "logps/chosen": -531.3941650390625,
1009
- "logps/rejected": -553.3253784179688,
1010
- "loss": 0.4197,
1011
- "rewards/accuracies": 0.8125,
1012
- "rewards/chosen": -1.9536300897598267,
1013
- "rewards/margins": 1.6977847814559937,
1014
- "rewards/rejected": -3.6514148712158203,
1015
- "step": 640
1016
- },
1017
- {
1018
- "epoch": 0.74,
1019
- "learning_rate": 9.89237689853889e-08,
1020
- "logits/chosen": -0.31330204010009766,
1021
- "logits/rejected": 0.36232098937034607,
1022
- "logps/chosen": -546.025390625,
1023
- "logps/rejected": -602.3193359375,
1024
- "loss": 0.3932,
1025
- "rewards/accuracies": 0.8374999761581421,
1026
- "rewards/chosen": -2.0167908668518066,
1027
- "rewards/margins": 1.6244287490844727,
1028
- "rewards/rejected": -3.6412200927734375,
1029
- "step": 650
1030
- },
1031
- {
1032
- "epoch": 0.75,
1033
- "learning_rate": 9.11628515022765e-08,
1034
- "logits/chosen": -0.31195029616355896,
1035
- "logits/rejected": 0.7031415700912476,
1036
- "logps/chosen": -575.6533813476562,
1037
- "logps/rejected": -550.7911376953125,
1038
- "loss": 0.3775,
1039
- "rewards/accuracies": 0.8374999761581421,
1040
- "rewards/chosen": -1.9456446170806885,
1041
- "rewards/margins": 1.9410841464996338,
1042
- "rewards/rejected": -3.8867290019989014,
1043
- "step": 660
1044
- },
1045
- {
1046
- "epoch": 0.76,
1047
- "learning_rate": 8.365056438189486e-08,
1048
- "logits/chosen": -0.1329679787158966,
1049
- "logits/rejected": 0.6911096572875977,
1050
- "logps/chosen": -526.4910278320312,
1051
- "logps/rejected": -568.0216064453125,
1052
- "loss": 0.3821,
1053
- "rewards/accuracies": 0.793749988079071,
1054
- "rewards/chosen": -2.2285189628601074,
1055
- "rewards/margins": 1.4839597940444946,
1056
- "rewards/rejected": -3.7124786376953125,
1057
- "step": 670
1058
- },
1059
- {
1060
- "epoch": 0.77,
1061
- "learning_rate": 7.639866672902101e-08,
1062
- "logits/chosen": 0.3368561863899231,
1063
- "logits/rejected": 1.2245566844940186,
1064
- "logps/chosen": -586.1475219726562,
1065
- "logps/rejected": -629.6542358398438,
1066
- "loss": 0.4028,
1067
- "rewards/accuracies": 0.800000011920929,
1068
- "rewards/chosen": -2.569129467010498,
1069
- "rewards/margins": 1.8280231952667236,
1070
- "rewards/rejected": -4.397152423858643,
1071
- "step": 680
1072
- },
1073
- {
1074
- "epoch": 0.78,
1075
- "learning_rate": 6.941851005657851e-08,
1076
- "logits/chosen": -0.06485060602426529,
1077
- "logits/rejected": 0.7306933403015137,
1078
- "logps/chosen": -572.1720581054688,
1079
- "logps/rejected": -576.8677978515625,
1080
- "loss": 0.3819,
1081
- "rewards/accuracies": 0.8500000238418579,
1082
- "rewards/chosen": -2.195324182510376,
1083
- "rewards/margins": 1.7558557987213135,
1084
- "rewards/rejected": -3.9511799812316895,
1085
- "step": 690
1086
- },
1087
- {
1088
- "epoch": 0.79,
1089
- "learning_rate": 6.272102051693051e-08,
1090
- "logits/chosen": 0.16706393659114838,
1091
- "logits/rejected": 0.8311805725097656,
1092
- "logps/chosen": -546.9618530273438,
1093
- "logps/rejected": -599.8685302734375,
1094
- "loss": 0.3779,
1095
- "rewards/accuracies": 0.7875000238418579,
1096
- "rewards/chosen": -2.3925366401672363,
1097
- "rewards/margins": 1.6777336597442627,
1098
- "rewards/rejected": -4.070270538330078,
1099
- "step": 700
1100
- },
1101
- {
1102
- "epoch": 0.79,
1103
- "eval_logits/chosen": 0.251458078622818,
1104
- "eval_logits/rejected": 0.8019012808799744,
1105
- "eval_logps/chosen": -539.7506713867188,
1106
- "eval_logps/rejected": -666.2688598632812,
1107
- "eval_loss": 0.49509289860725403,
1108
- "eval_rewards/accuracies": 0.765625,
1109
- "eval_rewards/chosen": -2.8271102905273438,
1110
- "eval_rewards/margins": 1.2620453834533691,
1111
- "eval_rewards/rejected": -4.089155673980713,
1112
- "eval_runtime": 52.9381,
1113
- "eval_samples_per_second": 37.78,
1114
- "eval_steps_per_second": 0.604,
1115
- "step": 700
1116
- },
1117
- {
1118
- "epoch": 0.8,
1119
- "learning_rate": 5.6316681798995844e-08,
1120
- "logits/chosen": 0.029453057795763016,
1121
- "logits/rejected": 0.8492851257324219,
1122
- "logps/chosen": -595.5863037109375,
1123
- "logps/rejected": -619.2969970703125,
1124
- "loss": 0.4273,
1125
- "rewards/accuracies": 0.7875000238418579,
1126
- "rewards/chosen": -2.4548239707946777,
1127
- "rewards/margins": 1.7372827529907227,
1128
- "rewards/rejected": -4.192107200622559,
1129
- "step": 710
1130
- },
1131
- {
1132
- "epoch": 0.81,
1133
- "learning_rate": 5.0215518717961256e-08,
1134
- "logits/chosen": 0.2557750344276428,
1135
- "logits/rejected": 0.9654140472412109,
1136
- "logps/chosen": -543.8341064453125,
1137
- "logps/rejected": -580.2445068359375,
1138
- "loss": 0.403,
1139
- "rewards/accuracies": 0.78125,
1140
- "rewards/chosen": -2.4623947143554688,
1141
- "rewards/margins": 1.6467905044555664,
1142
- "rewards/rejected": -4.109185218811035,
1143
- "step": 720
1144
- },
1145
- {
1146
- "epoch": 0.83,
1147
- "learning_rate": 4.4427081523275925e-08,
1148
- "logits/chosen": 0.16709019243717194,
1149
- "logits/rejected": 0.9560591578483582,
1150
- "logps/chosen": -535.726318359375,
1151
- "logps/rejected": -586.334228515625,
1152
- "loss": 0.3846,
1153
- "rewards/accuracies": 0.831250011920929,
1154
- "rewards/chosen": -2.379093647003174,
1155
- "rewards/margins": 1.706425428390503,
1156
- "rewards/rejected": -4.085518836975098,
1157
- "step": 730
1158
- },
1159
- {
1160
- "epoch": 0.84,
1161
- "learning_rate": 3.896043094949061e-08,
1162
- "logits/chosen": -0.04919125884771347,
1163
- "logits/rejected": 0.726272463798523,
1164
- "logps/chosen": -575.3306884765625,
1165
- "logps/rejected": -597.0858154296875,
1166
- "loss": 0.4039,
1167
- "rewards/accuracies": 0.7875000238418579,
1168
- "rewards/chosen": -2.4570398330688477,
1169
- "rewards/margins": 1.6302992105484009,
1170
- "rewards/rejected": -4.087338924407959,
1171
- "step": 740
1172
- },
1173
- {
1174
- "epoch": 0.85,
1175
- "learning_rate": 3.3824124033343557e-08,
1176
- "logits/chosen": 0.028565894812345505,
1177
- "logits/rejected": 0.8538961410522461,
1178
- "logps/chosen": -566.6370849609375,
1179
- "logps/rejected": -576.2894287109375,
1180
- "loss": 0.3796,
1181
- "rewards/accuracies": 0.78125,
1182
- "rewards/chosen": -2.445554256439209,
1183
- "rewards/margins": 1.5305721759796143,
1184
- "rewards/rejected": -3.976126194000244,
1185
- "step": 750
1186
- },
1187
- {
1188
- "epoch": 0.86,
1189
- "learning_rate": 2.9026200719291904e-08,
1190
- "logits/chosen": 0.14750410616397858,
1191
- "logits/rejected": 0.8671531677246094,
1192
- "logps/chosen": -491.7998962402344,
1193
- "logps/rejected": -554.5946655273438,
1194
- "loss": 0.3931,
1195
- "rewards/accuracies": 0.856249988079071,
1196
- "rewards/chosen": -2.194965362548828,
1197
- "rewards/margins": 1.5993218421936035,
1198
- "rewards/rejected": -3.7942872047424316,
1199
- "step": 760
1200
- },
1201
- {
1202
- "epoch": 0.87,
1203
- "learning_rate": 2.4574171274456433e-08,
1204
- "logits/chosen": 0.05547152832150459,
1205
- "logits/rejected": 0.8580353856086731,
1206
- "logps/chosen": -593.9608764648438,
1207
- "logps/rejected": -608.4605712890625,
1208
- "loss": 0.3707,
1209
- "rewards/accuracies": 0.78125,
1210
- "rewards/chosen": -2.500046491622925,
1211
- "rewards/margins": 1.6873514652252197,
1212
- "rewards/rejected": -4.1873979568481445,
1213
- "step": 770
1214
- },
1215
- {
1216
- "epoch": 0.88,
1217
- "learning_rate": 2.047500453267881e-08,
1218
- "logits/chosen": 0.07607009261846542,
1219
- "logits/rejected": 0.9816274642944336,
1220
- "logps/chosen": -613.8623657226562,
1221
- "logps/rejected": -627.4935302734375,
1222
- "loss": 0.3864,
1223
- "rewards/accuracies": 0.768750011920929,
1224
- "rewards/chosen": -2.576051712036133,
1225
- "rewards/margins": 1.8002961874008179,
1226
- "rewards/rejected": -4.37634801864624,
1227
- "step": 780
1228
- },
1229
- {
1230
- "epoch": 0.89,
1231
- "learning_rate": 1.673511698609292e-08,
1232
- "logits/chosen": -0.030661270022392273,
1233
- "logits/rejected": 0.8080152273178101,
1234
- "logps/chosen": -595.865478515625,
1235
- "logps/rejected": -620.65966796875,
1236
- "loss": 0.3947,
1237
- "rewards/accuracies": 0.8125,
1238
- "rewards/chosen": -2.505141496658325,
1239
- "rewards/margins": 1.7527068853378296,
1240
- "rewards/rejected": -4.257847785949707,
1241
- "step": 790
1242
- },
1243
- {
1244
- "epoch": 0.91,
1245
- "learning_rate": 1.3360362741285769e-08,
1246
- "logits/chosen": 0.3820663094520569,
1247
- "logits/rejected": 1.0134365558624268,
1248
- "logps/chosen": -519.1594848632812,
1249
- "logps/rejected": -563.0218505859375,
1250
- "loss": 0.3845,
1251
- "rewards/accuracies": 0.7562500238418579,
1252
- "rewards/chosen": -2.459944725036621,
1253
- "rewards/margins": 1.5404003858566284,
1254
- "rewards/rejected": -4.000345706939697,
1255
- "step": 800
1256
- },
1257
- {
1258
- "epoch": 0.91,
1259
- "eval_logits/chosen": 0.2531813681125641,
1260
- "eval_logits/rejected": 0.7844187021255493,
1261
- "eval_logps/chosen": -558.243408203125,
1262
- "eval_logps/rejected": -683.8961791992188,
1263
- "eval_loss": 0.49323520064353943,
1264
- "eval_rewards/accuracies": 0.76953125,
1265
- "eval_rewards/chosen": -3.012037754058838,
1266
- "eval_rewards/margins": 1.2533915042877197,
1267
- "eval_rewards/rejected": -4.2654290199279785,
1268
- "eval_runtime": 52.9535,
1269
- "eval_samples_per_second": 37.769,
1270
- "eval_steps_per_second": 0.604,
1271
- "step": 800
1272
- },
1273
- {
1274
- "epoch": 0.92,
1275
- "learning_rate": 1.0356024355769433e-08,
1276
- "logits/chosen": 0.051693208515644073,
1277
- "logits/rejected": 0.7009769678115845,
1278
- "logps/chosen": -603.3291015625,
1279
- "logps/rejected": -616.5624389648438,
1280
- "loss": 0.3942,
1281
- "rewards/accuracies": 0.793749988079071,
1282
- "rewards/chosen": -2.494053602218628,
1283
- "rewards/margins": 1.5036752223968506,
1284
- "rewards/rejected": -3.9977290630340576,
1285
- "step": 810
1286
- },
1287
- {
1288
- "epoch": 0.93,
1289
- "learning_rate": 7.726804569108597e-09,
1290
- "logits/chosen": 0.16616474092006683,
1291
- "logits/rejected": 1.1278895139694214,
1292
- "logps/chosen": -569.5582275390625,
1293
- "logps/rejected": -595.67724609375,
1294
- "loss": 0.3891,
1295
- "rewards/accuracies": 0.768750011920929,
1296
- "rewards/chosen": -2.447701930999756,
1297
- "rewards/margins": 1.670259714126587,
1298
- "rewards/rejected": -4.1179609298706055,
1299
- "step": 820
1300
- },
1301
- {
1302
- "epoch": 0.94,
1303
- "learning_rate": 5.476818941645561e-09,
1304
- "logits/chosen": 0.22033889591693878,
1305
- "logits/rejected": 0.872348964214325,
1306
- "logps/chosen": -580.0589599609375,
1307
- "logps/rejected": -621.200439453125,
1308
- "loss": 0.3729,
1309
- "rewards/accuracies": 0.84375,
1310
- "rewards/chosen": -2.370396137237549,
1311
- "rewards/margins": 1.893420934677124,
1312
- "rewards/rejected": -4.263816833496094,
1313
- "step": 830
1314
- },
1315
- {
1316
- "epoch": 0.95,
1317
- "learning_rate": 3.609589412347347e-09,
1318
- "logits/chosen": -0.021263647824525833,
1319
- "logits/rejected": 0.8406580686569214,
1320
- "logps/chosen": -605.0250244140625,
1321
- "logps/rejected": -622.5994873046875,
1322
- "loss": 0.4117,
1323
- "rewards/accuracies": 0.831250011920929,
1324
- "rewards/chosen": -2.3371009826660156,
1325
- "rewards/margins": 1.920668601989746,
1326
- "rewards/rejected": -4.2577691078186035,
1327
- "step": 840
1328
- },
1329
- {
1330
- "epoch": 0.96,
1331
- "learning_rate": 2.1280387858572667e-09,
1332
- "logits/chosen": 0.2490266114473343,
1333
- "logits/rejected": 0.9580374956130981,
1334
- "logps/chosen": -572.4114990234375,
1335
- "logps/rejected": -646.89892578125,
1336
- "loss": 0.3982,
1337
- "rewards/accuracies": 0.8062499761581421,
1338
- "rewards/chosen": -2.5151896476745605,
1339
- "rewards/margins": 1.6319019794464111,
1340
- "rewards/rejected": -4.147091388702393,
1341
- "step": 850
1342
- },
1343
- {
1344
- "epoch": 0.97,
1345
- "learning_rate": 1.03448615738172e-09,
1346
- "logits/chosen": 0.03189245983958244,
1347
- "logits/rejected": 1.1285068988800049,
1348
- "logps/chosen": -562.9724731445312,
1349
- "logps/rejected": -615.1611328125,
1350
- "loss": 0.3952,
1351
- "rewards/accuracies": 0.8500000238418579,
1352
- "rewards/chosen": -2.4744155406951904,
1353
- "rewards/margins": 1.8728519678115845,
1354
- "rewards/rejected": -4.3472676277160645,
1355
- "step": 860
1356
- },
1357
- {
1358
- "epoch": 0.98,
1359
- "learning_rate": 3.3064328257259575e-10,
1360
- "logits/chosen": -0.04908572882413864,
1361
- "logits/rejected": 0.6742405891418457,
1362
- "logps/chosen": -604.3118896484375,
1363
- "logps/rejected": -606.341552734375,
1364
- "loss": 0.3868,
1365
- "rewards/accuracies": 0.793749988079071,
1366
- "rewards/chosen": -2.5000436305999756,
1367
- "rewards/margins": 1.6203769445419312,
1368
- "rewards/rejected": -4.120420932769775,
1369
- "step": 870
1370
- },
1371
- {
1372
- "epoch": 1.0,
1373
- "learning_rate": 1.7611898088715216e-11,
1374
- "logits/chosen": -0.092198446393013,
1375
- "logits/rejected": 0.9236906170845032,
1376
- "logps/chosen": -669.6597900390625,
1377
- "logps/rejected": -655.7959594726562,
1378
- "loss": 0.3909,
1379
- "rewards/accuracies": 0.8187500238418579,
1380
- "rewards/chosen": -2.540384292602539,
1381
- "rewards/margins": 1.8847181797027588,
1382
- "rewards/rejected": -4.425102710723877,
1383
- "step": 880
1384
- },
1385
  {
1386
  "epoch": 1.0,
1387
- "step": 883,
1388
  "total_flos": 0.0,
1389
- "train_loss": 0.4371140412269065,
1390
- "train_runtime": 8003.3982,
1391
- "train_samples_per_second": 14.123,
1392
- "train_steps_per_second": 0.11
1393
  }
1394
  ],
1395
  "logging_steps": 10,
1396
- "max_steps": 883,
1397
  "num_train_epochs": 1,
1398
  "save_steps": 100,
1399
  "total_flos": 0.0,
 
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
+ "epoch": 0.998766954377312,
5
  "eval_steps": 100,
6
+ "global_step": 405,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
  "epoch": 0.0,
13
+ "learning_rate": 1.2195121951219512e-08,
14
+ "logits/chosen": -2.8088459968566895,
15
+ "logits/rejected": -2.7595884799957275,
16
+ "logps/chosen": -368.90777587890625,
17
+ "logps/rejected": -133.10202026367188,
18
+ "loss": 0.3669,
19
  "rewards/accuracies": 0.0,
20
  "rewards/chosen": 0.0,
21
  "rewards/margins": 0.0,
 
23
  "step": 1
24
  },
25
  {
26
+ "epoch": 0.02,
27
+ "learning_rate": 1.219512195121951e-07,
28
+ "logits/chosen": -2.838677406311035,
29
+ "logits/rejected": -2.8248190879821777,
30
+ "logps/chosen": -433.822265625,
31
+ "logps/rejected": -114.71543884277344,
32
+ "loss": 0.3373,
33
+ "rewards/accuracies": 0.5555555820465088,
34
+ "rewards/chosen": 0.0010175479110330343,
35
+ "rewards/margins": 0.0018583540804684162,
36
+ "rewards/rejected": -0.0008408060530200601,
37
  "step": 10
38
  },
39
  {
40
+ "epoch": 0.05,
41
+ "learning_rate": 2.439024390243902e-07,
42
+ "logits/chosen": -2.798461437225342,
43
+ "logits/rejected": -2.765454053878784,
44
+ "logps/chosen": -436.7164001464844,
45
+ "logps/rejected": -109.3239517211914,
46
+ "loss": 0.3366,
47
+ "rewards/accuracies": 0.71875,
48
+ "rewards/chosen": 0.020252179354429245,
49
+ "rewards/margins": 0.03614808991551399,
50
+ "rewards/rejected": -0.015895914286375046,
51
  "step": 20
52
  },
53
  {
54
+ "epoch": 0.07,
55
+ "learning_rate": 3.6585365853658536e-07,
56
+ "logits/chosen": -2.7184653282165527,
57
+ "logits/rejected": -2.6913540363311768,
58
+ "logps/chosen": -422.36480712890625,
59
+ "logps/rejected": -127.92415618896484,
60
+ "loss": 0.3034,
61
+ "rewards/accuracies": 0.78125,
62
+ "rewards/chosen": 0.06996239721775055,
63
+ "rewards/margins": 0.19669881463050842,
64
+ "rewards/rejected": -0.12673643231391907,
65
  "step": 30
66
  },
67
  {
68
+ "epoch": 0.1,
69
+ "learning_rate": 4.878048780487804e-07,
70
+ "logits/chosen": -2.592528820037842,
71
+ "logits/rejected": -2.5740997791290283,
72
+ "logps/chosen": -396.34332275390625,
73
+ "logps/rejected": -138.47140502929688,
74
+ "loss": 0.2563,
75
+ "rewards/accuracies": 0.768750011920929,
76
+ "rewards/chosen": 0.023515433073043823,
77
+ "rewards/margins": 0.41449323296546936,
78
+ "rewards/rejected": -0.39097777009010315,
79
  "step": 40
80
  },
81
  {
82
+ "epoch": 0.12,
83
+ "learning_rate": 4.992461696250783e-07,
84
+ "logits/chosen": -2.425698757171631,
85
+ "logits/rejected": -2.399880886077881,
86
+ "logps/chosen": -445.71978759765625,
87
+ "logps/rejected": -201.20761108398438,
88
+ "loss": 0.1773,
89
+ "rewards/accuracies": 0.800000011920929,
90
+ "rewards/chosen": -0.034065067768096924,
91
+ "rewards/margins": 0.8275578618049622,
92
+ "rewards/rejected": -0.8616229295730591,
93
  "step": 50
94
  },
95
  {
96
+ "epoch": 0.15,
97
+ "learning_rate": 4.966461721767899e-07,
98
+ "logits/chosen": -2.4016242027282715,
99
+ "logits/rejected": -2.3502964973449707,
100
+ "logps/chosen": -424.775390625,
101
+ "logps/rejected": -253.54776000976562,
102
+ "loss": 0.1294,
103
+ "rewards/accuracies": 0.75,
104
+ "rewards/chosen": -0.3768869638442993,
105
+ "rewards/margins": 0.9074532389640808,
106
+ "rewards/rejected": -1.2843403816223145,
107
  "step": 60
108
  },
109
  {
110
+ "epoch": 0.17,
111
+ "learning_rate": 4.922100518015975e-07,
112
+ "logits/chosen": -2.43666410446167,
113
+ "logits/rejected": -2.387927293777466,
114
+ "logps/chosen": -420.531494140625,
115
+ "logps/rejected": -273.5174255371094,
116
+ "loss": 0.1116,
117
+ "rewards/accuracies": 0.768750011920929,
118
+ "rewards/chosen": -0.3666774034500122,
119
+ "rewards/margins": 1.1816540956497192,
120
+ "rewards/rejected": -1.548331618309021,
121
  "step": 70
122
  },
123
  {
124
+ "epoch": 0.2,
125
+ "learning_rate": 4.859708325770919e-07,
126
+ "logits/chosen": -2.37559175491333,
127
+ "logits/rejected": -2.327603816986084,
128
+ "logps/chosen": -472.6153259277344,
129
+ "logps/rejected": -317.5882873535156,
130
+ "loss": 0.0637,
131
+ "rewards/accuracies": 0.7250000238418579,
132
+ "rewards/chosen": -0.8155827522277832,
133
+ "rewards/margins": 1.3035672903060913,
134
+ "rewards/rejected": -2.119150161743164,
135
  "step": 80
136
  },
137
  {
138
+ "epoch": 0.22,
139
+ "learning_rate": 4.779749614980225e-07,
140
+ "logits/chosen": -2.3662772178649902,
141
+ "logits/rejected": -2.3145246505737305,
142
+ "logps/chosen": -546.580810546875,
143
+ "logps/rejected": -391.6395263671875,
144
+ "loss": 0.0501,
145
+ "rewards/accuracies": 0.8374999761581421,
146
+ "rewards/chosen": -0.7051855325698853,
147
+ "rewards/margins": 1.912410020828247,
148
+ "rewards/rejected": -2.617595672607422,
149
  "step": 90
150
  },
151
  {
152
+ "epoch": 0.25,
153
+ "learning_rate": 4.682819627081427e-07,
154
+ "logits/chosen": -2.3446455001831055,
155
+ "logits/rejected": -2.278437852859497,
156
+ "logps/chosen": -482.21063232421875,
157
+ "logps/rejected": -363.7936096191406,
158
+ "loss": 0.0594,
159
+ "rewards/accuracies": 0.831250011920929,
160
+ "rewards/chosen": -0.6942282915115356,
161
+ "rewards/margins": 1.7591311931610107,
162
+ "rewards/rejected": -2.4533591270446777,
163
  "step": 100
164
  },
165
  {
166
+ "epoch": 0.25,
167
+ "eval_logits/chosen": -2.340813636779785,
168
+ "eval_logits/rejected": -2.327035903930664,
169
+ "eval_logps/chosen": -428.9503173828125,
170
+ "eval_logps/rejected": -451.85736083984375,
171
+ "eval_loss": 0.10351637005805969,
172
+ "eval_rewards/accuracies": 0.6171875,
173
+ "eval_rewards/chosen": -1.7191063165664673,
174
+ "eval_rewards/margins": 0.22593416273593903,
175
+ "eval_rewards/rejected": -1.9450405836105347,
176
+ "eval_runtime": 53.3665,
177
+ "eval_samples_per_second": 37.477,
178
+ "eval_steps_per_second": 0.6,
179
  "step": 100
180
  },
181
  {
182
+ "epoch": 0.27,
183
+ "learning_rate": 4.569639943810477e-07,
184
+ "logits/chosen": -2.3267300128936768,
185
+ "logits/rejected": -2.256336212158203,
186
+ "logps/chosen": -502.18572998046875,
187
+ "logps/rejected": -387.1337890625,
188
+ "loss": 0.0472,
189
+ "rewards/accuracies": 0.768750011920929,
190
+ "rewards/chosen": -0.9502252340316772,
191
+ "rewards/margins": 1.751552939414978,
192
+ "rewards/rejected": -2.7017781734466553,
193
  "step": 110
194
  },
195
  {
196
+ "epoch": 0.3,
197
+ "learning_rate": 4.4410531154874543e-07,
198
+ "logits/chosen": -2.3445639610290527,
199
+ "logits/rejected": -2.2553389072418213,
200
+ "logps/chosen": -552.4199829101562,
201
+ "logps/rejected": -416.80755615234375,
202
+ "loss": 0.0477,
203
+ "rewards/accuracies": 0.78125,
204
+ "rewards/chosen": -1.0058166980743408,
205
+ "rewards/margins": 1.8569440841674805,
206
+ "rewards/rejected": -2.8627610206604004,
207
  "step": 120
208
  },
209
  {
210
+ "epoch": 0.32,
211
+ "learning_rate": 4.298016388768561e-07,
212
+ "logits/chosen": -2.396329641342163,
213
+ "logits/rejected": -2.322551727294922,
214
+ "logps/chosen": -542.0057373046875,
215
+ "logps/rejected": -407.68634033203125,
216
+ "loss": 0.0418,
217
+ "rewards/accuracies": 0.8187500238418579,
218
+ "rewards/chosen": -0.768031895160675,
219
+ "rewards/margins": 2.077030658721924,
220
+ "rewards/rejected": -2.845062017440796,
221
  "step": 130
222
  },
223
  {
224
+ "epoch": 0.35,
225
+ "learning_rate": 4.1415945805573005e-07,
226
+ "logits/chosen": -2.3263237476348877,
227
+ "logits/rejected": -2.2574667930603027,
228
+ "logps/chosen": -506.77471923828125,
229
+ "logps/rejected": -388.97479248046875,
230
+ "loss": 0.0506,
231
+ "rewards/accuracies": 0.8125,
232
+ "rewards/chosen": -0.8580313920974731,
233
+ "rewards/margins": 1.7057987451553345,
234
+ "rewards/rejected": -2.5638298988342285,
235
  "step": 140
236
  },
237
  {
238
+ "epoch": 0.37,
239
+ "learning_rate": 3.972952151123984e-07,
240
+ "logits/chosen": -2.3322761058807373,
241
+ "logits/rejected": -2.2486355304718018,
242
+ "logps/chosen": -450.03778076171875,
243
+ "logps/rejected": -351.47064208984375,
244
+ "loss": 0.0528,
245
+ "rewards/accuracies": 0.824999988079071,
246
+ "rewards/chosen": -0.7531972527503967,
247
+ "rewards/margins": 1.7522554397583008,
248
+ "rewards/rejected": -2.505452871322632,
249
  "step": 150
250
  },
251
  {
252
+ "epoch": 0.39,
253
+ "learning_rate": 3.793344535444142e-07,
254
+ "logits/chosen": -2.298706531524658,
255
+ "logits/rejected": -2.205777168273926,
256
+ "logps/chosen": -549.6655883789062,
257
+ "logps/rejected": -407.4877624511719,
258
+ "loss": 0.0361,
259
+ "rewards/accuracies": 0.8187500238418579,
260
+ "rewards/chosen": -0.8248310089111328,
261
+ "rewards/margins": 2.1384449005126953,
262
+ "rewards/rejected": -2.963275909423828,
263
  "step": 160
264
  },
265
  {
266
+ "epoch": 0.42,
267
+ "learning_rate": 3.604108797288461e-07,
268
+ "logits/chosen": -2.301478862762451,
269
+ "logits/rejected": -2.199977397918701,
270
+ "logps/chosen": -550.0228271484375,
271
+ "logps/rejected": -447.4345703125,
272
+ "loss": 0.0349,
273
+ "rewards/accuracies": 0.831250011920929,
274
+ "rewards/chosen": -1.1104724407196045,
275
+ "rewards/margins": 2.2591710090637207,
276
+ "rewards/rejected": -3.369643449783325,
277
  "step": 170
278
  },
279
  {
280
+ "epoch": 0.44,
281
+ "learning_rate": 3.40665367563858e-07,
282
+ "logits/chosen": -2.2790443897247314,
283
+ "logits/rejected": -2.1830639839172363,
284
+ "logps/chosen": -540.7822265625,
285
+ "logps/rejected": -438.80816650390625,
286
+ "loss": 0.0358,
287
+ "rewards/accuracies": 0.7875000238418579,
288
+ "rewards/chosen": -1.3068325519561768,
289
+ "rewards/margins": 1.9258372783660889,
290
+ "rewards/rejected": -3.2326698303222656,
291
  "step": 180
292
  },
293
  {
294
+ "epoch": 0.47,
295
+ "learning_rate": 3.202449097526798e-07,
296
+ "logits/chosen": -2.2940845489501953,
297
+ "logits/rejected": -2.213531732559204,
298
+ "logps/chosen": -518.0568237304688,
299
+ "logps/rejected": -424.33331298828125,
300
+ "loss": 0.0358,
301
+ "rewards/accuracies": 0.800000011920929,
302
+ "rewards/chosen": -1.1591523885726929,
303
+ "rewards/margins": 2.0107340812683105,
304
+ "rewards/rejected": -3.169886350631714,
305
  "step": 190
306
  },
307
  {
308
+ "epoch": 0.49,
309
+ "learning_rate": 2.993015235369905e-07,
310
+ "logits/chosen": -2.2501273155212402,
311
+ "logits/rejected": -2.1389498710632324,
312
+ "logps/chosen": -568.6901245117188,
313
+ "logps/rejected": -470.89617919921875,
314
+ "loss": 0.0329,
315
+ "rewards/accuracies": 0.8062499761581421,
316
+ "rewards/chosen": -1.2941691875457764,
317
+ "rewards/margins": 2.236302375793457,
318
+ "rewards/rejected": -3.5304713249206543,
319
  "step": 200
320
  },
321
  {
322
+ "epoch": 0.49,
323
+ "eval_logits/chosen": -2.2352473735809326,
324
+ "eval_logits/rejected": -2.214733362197876,
325
+ "eval_logps/chosen": -501.9567565917969,
326
+ "eval_logps/rejected": -538.0303955078125,
327
+ "eval_loss": 0.06932022422552109,
328
+ "eval_rewards/accuracies": 0.609375,
329
+ "eval_rewards/chosen": -2.449171304702759,
330
+ "eval_rewards/margins": 0.35759952664375305,
331
+ "eval_rewards/rejected": -2.8067705631256104,
332
+ "eval_runtime": 53.3061,
333
+ "eval_samples_per_second": 37.519,
334
+ "eval_steps_per_second": 0.6,
335
  "step": 200
336
  },
337
  {
338
+ "epoch": 0.52,
339
+ "learning_rate": 2.7799111902582693e-07,
340
+ "logits/chosen": -2.2516720294952393,
341
+ "logits/rejected": -2.1468265056610107,
342
+ "logps/chosen": -544.9647216796875,
343
+ "logps/rejected": -425.84832763671875,
344
+ "loss": 0.0319,
345
+ "rewards/accuracies": 0.731249988079071,
346
+ "rewards/chosen": -1.4447880983352661,
347
+ "rewards/margins": 1.7926721572875977,
348
+ "rewards/rejected": -3.2374606132507324,
349
  "step": 210
350
  },
351
  {
352
+ "epoch": 0.54,
353
+ "learning_rate": 2.564723385445869e-07,
354
+ "logits/chosen": -2.325510025024414,
355
+ "logits/rejected": -2.2458481788635254,
356
+ "logps/chosen": -532.0316772460938,
357
+ "logps/rejected": -426.2433166503906,
358
+ "loss": 0.0441,
359
+ "rewards/accuracies": 0.8125,
360
+ "rewards/chosen": -1.1441152095794678,
361
+ "rewards/margins": 1.8752161264419556,
362
+ "rewards/rejected": -3.019331455230713,
363
  "step": 220
364
  },
365
  {
366
+ "epoch": 0.57,
367
+ "learning_rate": 2.3490537564442845e-07,
368
+ "logits/chosen": -2.3061037063598633,
369
+ "logits/rejected": -2.2063522338867188,
370
+ "logps/chosen": -515.2584228515625,
371
+ "logps/rejected": -387.2288818359375,
372
+ "loss": 0.0536,
373
+ "rewards/accuracies": 0.7749999761581421,
374
+ "rewards/chosen": -1.2331289052963257,
375
+ "rewards/margins": 1.573769211769104,
376
+ "rewards/rejected": -2.806898355484009,
377
  "step": 230
378
  },
379
  {
380
+ "epoch": 0.59,
381
+ "learning_rate": 2.1345078256378801e-07,
382
+ "logits/chosen": -2.3259823322296143,
383
+ "logits/rejected": -2.232604503631592,
384
+ "logps/chosen": -529.44775390625,
385
+ "logps/rejected": -442.9454040527344,
386
+ "loss": 0.0384,
387
+ "rewards/accuracies": 0.768750011920929,
388
+ "rewards/chosen": -1.2063531875610352,
389
+ "rewards/margins": 2.0420820713043213,
390
+ "rewards/rejected": -3.2484352588653564,
391
  "step": 240
392
  },
393
  {
394
+ "epoch": 0.62,
395
+ "learning_rate": 1.9226827501969865e-07,
396
+ "logits/chosen": -2.310181140899658,
397
+ "logits/rejected": -2.225755214691162,
398
+ "logps/chosen": -569.6714477539062,
399
+ "logps/rejected": -482.9613342285156,
400
+ "loss": 0.0368,
401
+ "rewards/accuracies": 0.8374999761581421,
402
+ "rewards/chosen": -1.2699750661849976,
403
+ "rewards/margins": 2.3776299953460693,
404
+ "rewards/rejected": -3.6476047039031982,
405
  "step": 250
406
  },
407
  {
408
+ "epoch": 0.64,
409
+ "learning_rate": 1.715155432264775e-07,
410
+ "logits/chosen": -2.3007090091705322,
411
+ "logits/rejected": -2.2159204483032227,
412
+ "logps/chosen": -574.6656494140625,
413
+ "logps/rejected": -473.60528564453125,
414
+ "loss": 0.0275,
415
+ "rewards/accuracies": 0.824999988079071,
416
+ "rewards/chosen": -1.4263044595718384,
417
+ "rewards/margins": 2.146233081817627,
418
+ "rewards/rejected": -3.572537660598755,
419
  "step": 260
420
  },
421
  {
422
+ "epoch": 0.67,
423
+ "learning_rate": 1.51347077992983e-07,
424
+ "logits/chosen": -2.280165195465088,
425
+ "logits/rejected": -2.1988308429718018,
426
+ "logps/chosen": -573.0145874023438,
427
+ "logps/rejected": -490.4935607910156,
428
+ "loss": 0.024,
429
+ "rewards/accuracies": 0.8187500238418579,
430
+ "rewards/chosen": -1.6931577920913696,
431
+ "rewards/margins": 1.988318681716919,
432
+ "rewards/rejected": -3.68147611618042,
433
  "step": 270
434
  },
435
  {
436
+ "epoch": 0.69,
437
+ "learning_rate": 1.3191302063739906e-07,
438
+ "logits/chosen": -2.247427463531494,
439
+ "logits/rejected": -2.1717417240142822,
440
+ "logps/chosen": -552.9573364257812,
441
+ "logps/rejected": -480.90435791015625,
442
+ "loss": 0.0231,
443
+ "rewards/accuracies": 0.7562500238418579,
444
+ "rewards/chosen": -1.7376149892807007,
445
+ "rewards/margins": 1.9405027627944946,
446
+ "rewards/rejected": -3.678117275238037,
447
  "step": 280
448
  },
449
  {
450
+ "epoch": 0.72,
451
+ "learning_rate": 1.1335804528119475e-07,
452
+ "logits/chosen": -2.3430678844451904,
453
+ "logits/rejected": -2.2265610694885254,
454
+ "logps/chosen": -586.9962158203125,
455
+ "logps/rejected": -472.01611328125,
456
+ "loss": 0.0285,
457
+ "rewards/accuracies": 0.7749999761581421,
458
+ "rewards/chosen": -1.5123710632324219,
459
+ "rewards/margins": 2.2006583213806152,
460
+ "rewards/rejected": -3.713029384613037,
461
  "step": 290
462
  },
463
  {
464
+ "epoch": 0.74,
465
+ "learning_rate": 9.582028184286423e-08,
466
+ "logits/chosen": -2.2495548725128174,
467
+ "logits/rejected": -2.186642646789551,
468
+ "logps/chosen": -531.0364990234375,
469
+ "logps/rejected": -480.0726623535156,
470
+ "loss": 0.0312,
471
+ "rewards/accuracies": 0.7437499761581421,
472
+ "rewards/chosen": -1.7118114233016968,
473
+ "rewards/margins": 1.8730456829071045,
474
+ "rewards/rejected": -3.58485746383667,
475
  "step": 300
476
  },
477
  {
478
+ "epoch": 0.74,
479
+ "eval_logits/chosen": -2.2933216094970703,
480
+ "eval_logits/rejected": -2.2721123695373535,
481
+ "eval_logps/chosen": -501.1633605957031,
482
+ "eval_logps/rejected": -543.5177612304688,
483
+ "eval_loss": 0.06885366886854172,
484
+ "eval_rewards/accuracies": 0.61328125,
485
+ "eval_rewards/chosen": -2.441237449645996,
486
+ "eval_rewards/margins": 0.42040756344795227,
487
+ "eval_rewards/rejected": -2.861644983291626,
488
+ "eval_runtime": 53.2903,
489
+ "eval_samples_per_second": 37.53,
490
+ "eval_steps_per_second": 0.6,
491
  "step": 300
492
  },
493
  {
494
+ "epoch": 0.76,
495
+ "learning_rate": 7.943028774907065e-08,
496
+ "logits/chosen": -2.2719688415527344,
497
+ "logits/rejected": -2.1988675594329834,
498
+ "logps/chosen": -524.6929931640625,
499
+ "logps/rejected": -446.8042907714844,
500
+ "loss": 0.0349,
501
+ "rewards/accuracies": 0.793749988079071,
502
+ "rewards/chosen": -1.4022165536880493,
503
+ "rewards/margins": 1.8806768655776978,
504
+ "rewards/rejected": -3.282893419265747,
505
  "step": 310
506
  },
507
  {
508
+ "epoch": 0.79,
509
+ "learning_rate": 6.431007601814637e-08,
510
+ "logits/chosen": -2.2960824966430664,
511
+ "logits/rejected": -2.2386252880096436,
512
+ "logps/chosen": -477.001953125,
513
+ "logps/rejected": -436.0245666503906,
514
+ "loss": 0.0298,
515
+ "rewards/accuracies": 0.8374999761581421,
516
+ "rewards/chosen": -1.4929635524749756,
517
+ "rewards/margins": 1.7944204807281494,
518
+ "rewards/rejected": -3.287383556365967,
519
  "step": 320
520
  },
521
  {
522
+ "epoch": 0.81,
523
+ "learning_rate": 5.0572206951246e-08,
524
+ "logits/chosen": -2.277937650680542,
525
+ "logits/rejected": -2.1940300464630127,
526
+ "logps/chosen": -516.416015625,
527
+ "logps/rejected": -444.90032958984375,
528
+ "loss": 0.0329,
529
+ "rewards/accuracies": 0.7749999761581421,
530
+ "rewards/chosen": -1.4886820316314697,
531
+ "rewards/margins": 1.8972896337509155,
532
+ "rewards/rejected": -3.385971784591675,
533
  "step": 330
534
  },
535
  {
536
+ "epoch": 0.84,
537
+ "learning_rate": 3.831895019292897e-08,
538
+ "logits/chosen": -2.3472743034362793,
539
+ "logits/rejected": -2.266993999481201,
540
+ "logps/chosen": -560.1998291015625,
541
+ "logps/rejected": -486.14801025390625,
542
+ "loss": 0.0324,
543
  "rewards/accuracies": 0.800000011920929,
544
+ "rewards/chosen": -1.25786554813385,
545
+ "rewards/margins": 2.4262924194335938,
546
+ "rewards/rejected": -3.6841578483581543,
547
  "step": 340
548
  },
549
  {
550
+ "epoch": 0.86,
551
+ "learning_rate": 2.764152339909756e-08,
552
+ "logits/chosen": -2.2894670963287354,
553
+ "logits/rejected": -2.2070441246032715,
554
+ "logps/chosen": -551.2086181640625,
555
+ "logps/rejected": -415.3118591308594,
556
+ "loss": 0.0328,
557
+ "rewards/accuracies": 0.793749988079071,
558
+ "rewards/chosen": -1.2593928575515747,
559
+ "rewards/margins": 1.9064128398895264,
560
+ "rewards/rejected": -3.1658055782318115,
561
  "step": 350
562
  },
563
  {
564
+ "epoch": 0.89,
565
+ "learning_rate": 1.861941317991664e-08,
566
+ "logits/chosen": -2.3396449089050293,
567
+ "logits/rejected": -2.227651834487915,
568
+ "logps/chosen": -571.0888061523438,
569
+ "logps/rejected": -453.03277587890625,
570
+ "loss": 0.0325,
571
+ "rewards/accuracies": 0.8374999761581421,
572
+ "rewards/chosen": -1.14793860912323,
573
+ "rewards/margins": 2.2367420196533203,
574
+ "rewards/rejected": -3.3846805095672607,
575
  "step": 360
576
  },
577
  {
578
+ "epoch": 0.91,
579
+ "learning_rate": 1.13197833728636e-08,
580
+ "logits/chosen": -2.2972564697265625,
581
+ "logits/rejected": -2.215446710586548,
582
+ "logps/chosen": -527.4664306640625,
583
+ "logps/rejected": -465.6924743652344,
584
+ "loss": 0.0288,
585
+ "rewards/accuracies": 0.824999988079071,
586
+ "rewards/chosen": -1.247899055480957,
587
+ "rewards/margins": 2.289482355117798,
588
+ "rewards/rejected": -3.537381410598755,
589
  "step": 370
590
  },
591
  {
592
+ "epoch": 0.94,
593
+ "learning_rate": 5.79697505093521e-09,
594
+ "logits/chosen": -2.293482542037964,
595
+ "logits/rejected": -2.2097363471984863,
596
+ "logps/chosen": -540.6966552734375,
597
+ "logps/rejected": -439.814697265625,
598
+ "loss": 0.0375,
599
+ "rewards/accuracies": 0.768750011920929,
600
+ "rewards/chosen": -1.383996605873108,
601
+ "rewards/margins": 1.9607197046279907,
602
+ "rewards/rejected": -3.3447163105010986,
603
  "step": 380
604
  },
605
  {
606
+ "epoch": 0.96,
607
+ "learning_rate": 2.092101988131256e-09,
608
+ "logits/chosen": -2.346567153930664,
609
+ "logits/rejected": -2.220730781555176,
610
+ "logps/chosen": -575.7041625976562,
611
+ "logps/rejected": -463.69427490234375,
612
+ "loss": 0.0315,
613
+ "rewards/accuracies": 0.8687499761581421,
614
+ "rewards/chosen": -1.1256561279296875,
615
+ "rewards/margins": 2.420063018798828,
616
+ "rewards/rejected": -3.5457186698913574,
617
  "step": 390
618
  },
619
  {
620
+ "epoch": 0.99,
621
+ "learning_rate": 2.327445937151673e-10,
622
+ "logits/chosen": -2.3339614868164062,
623
+ "logits/rejected": -2.2517640590667725,
624
+ "logps/chosen": -568.7457275390625,
625
+ "logps/rejected": -479.13653564453125,
626
+ "loss": 0.0331,
627
+ "rewards/accuracies": 0.831250011920929,
628
+ "rewards/chosen": -1.2012748718261719,
629
+ "rewards/margins": 2.3051795959472656,
630
+ "rewards/rejected": -3.5064544677734375,
631
  "step": 400
632
  },
633
  {
634
+ "epoch": 0.99,
635
+ "eval_logits/chosen": -2.3029849529266357,
636
+ "eval_logits/rejected": -2.282188892364502,
637
+ "eval_logps/chosen": -494.22357177734375,
638
+ "eval_logps/rejected": -539.6053466796875,
639
+ "eval_loss": 0.07123579829931259,
640
+ "eval_rewards/accuracies": 0.625,
641
+ "eval_rewards/chosen": -2.3718395233154297,
642
+ "eval_rewards/margins": 0.45068085193634033,
643
+ "eval_rewards/rejected": -2.8225200176239014,
644
+ "eval_runtime": 53.2767,
645
+ "eval_samples_per_second": 37.54,
646
+ "eval_steps_per_second": 0.601,
647
  "step": 400
648
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
649
  {
650
  "epoch": 1.0,
651
+ "step": 405,
652
  "total_flos": 0.0,
653
+ "train_loss": 0.0722552685457983,
654
+ "train_runtime": 3732.8792,
655
+ "train_samples_per_second": 13.902,
656
+ "train_steps_per_second": 0.108
657
  }
658
  ],
659
  "logging_steps": 10,
660
+ "max_steps": 405,
661
  "num_train_epochs": 1,
662
  "save_steps": 100,
663
  "total_flos": 0.0,