Mark-Arcee commited on
Commit
a27f778
1 Parent(s): 1117a9a

Model save

Browse files
README.md CHANGED
@@ -2,13 +2,12 @@
2
  license: apache-2.0
3
  library_name: peft
4
  tags:
5
- - alignment-handbook
6
  - trl
7
  - sft
8
  - generated_from_trainer
9
  base_model: mistralai/Mistral-7B-Instruct-v0.2
10
  datasets:
11
- - arcee-ai/SFT-Testing-HF-Format
12
  model-index:
13
  - name: zilo-instruct-v2-sft-qlora
14
  results: []
@@ -19,9 +18,9 @@ should probably proofread and complete it, then remove this comment. -->
19
 
20
  # zilo-instruct-v2-sft-qlora
21
 
22
- This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on the arcee-ai/SFT-Testing-HF-Format dataset.
23
  It achieves the following results on the evaluation set:
24
- - Loss: 0.3868
25
 
26
  ## Model description
27
 
@@ -40,7 +39,7 @@ More information needed
40
  ### Training hyperparameters
41
 
42
  The following hyperparameters were used during training:
43
- - learning_rate: 0.0002
44
  - train_batch_size: 4
45
  - eval_batch_size: 8
46
  - seed: 42
@@ -56,9 +55,9 @@ The following hyperparameters were used during training:
56
 
57
  | Training Loss | Epoch | Step | Validation Loss |
58
  |:-------------:|:------:|:----:|:---------------:|
59
- | 0.4177 | 0.9981 | 257 | 0.4282 |
60
- | 0.361 | 2.0 | 515 | 0.3842 |
61
- | 0.3036 | 2.9942 | 771 | 0.3868 |
62
 
63
 
64
  ### Framework versions
 
2
  license: apache-2.0
3
  library_name: peft
4
  tags:
 
5
  - trl
6
  - sft
7
  - generated_from_trainer
8
  base_model: mistralai/Mistral-7B-Instruct-v0.2
9
  datasets:
10
+ - generator
11
  model-index:
12
  - name: zilo-instruct-v2-sft-qlora
13
  results: []
 
18
 
19
  # zilo-instruct-v2-sft-qlora
20
 
21
+ This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on the generator dataset.
22
  It achieves the following results on the evaluation set:
23
+ - Loss: 0.6246
24
 
25
  ## Model description
26
 
 
39
  ### Training hyperparameters
40
 
41
  The following hyperparameters were used during training:
42
+ - learning_rate: 2e-05
43
  - train_batch_size: 4
44
  - eval_batch_size: 8
45
  - seed: 42
 
55
 
56
  | Training Loss | Epoch | Step | Validation Loss |
57
  |:-------------:|:------:|:----:|:---------------:|
58
+ | 0.7588 | 0.9899 | 49 | 0.7386 |
59
+ | 0.6201 | 2.0 | 99 | 0.6307 |
60
+ | 0.6087 | 2.9697 | 147 | 0.6246 |
61
 
62
 
63
  ### Framework versions
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:77afec2299871a18000ed3025bb5f0e6b73cae36b7b602314d87cc216e37c996
3
  size 83946192
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e1a7c5ba3953bb0b3141884f066d930329b6e5947c0a580e0ee9b4f54c973f98
3
  size 83946192
all_results.json CHANGED
@@ -1,14 +1,9 @@
1
  {
2
- "epoch": 2.994174757281553,
3
- "eval_loss": 0.38681623339653015,
4
- "eval_runtime": 22.0502,
5
- "eval_samples": 1530,
6
- "eval_samples_per_second": 4.989,
7
- "eval_steps_per_second": 0.635,
8
- "total_flos": 5.421128640186286e+17,
9
- "train_loss": 0.2601610358741651,
10
- "train_runtime": 3322.581,
11
- "train_samples": 29084,
12
- "train_samples_per_second": 1.86,
13
- "train_steps_per_second": 0.232
14
  }
 
1
  {
2
+ "epoch": 2.9696969696969697,
3
+ "total_flos": 2.0566538541072384e+17,
4
+ "train_loss": 0.7671454966473742,
5
+ "train_runtime": 1686.4021,
6
+ "train_samples": 12338,
7
+ "train_samples_per_second": 0.699,
8
+ "train_steps_per_second": 0.087
 
 
 
 
 
9
  }
runs/May25_02-29-05_5c31577a2818/events.out.tfevents.1716604156.5c31577a2818.18685.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:563d134a07485feab6873aee4220a6599db1633608be37a5cd4402ca1c6ef9e1
3
- size 10279
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a90ad6eb838a4d61c4be30dc0597b702e1f9dad091f28d7f7d628a084861c6a3
3
+ size 12783
train_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
- "epoch": 2.994174757281553,
3
- "total_flos": 5.421128640186286e+17,
4
- "train_loss": 0.2601610358741651,
5
- "train_runtime": 3322.581,
6
- "train_samples": 29084,
7
- "train_samples_per_second": 1.86,
8
- "train_steps_per_second": 0.232
9
  }
 
1
  {
2
+ "epoch": 2.9696969696969697,
3
+ "total_flos": 2.0566538541072384e+17,
4
+ "train_loss": 0.7671454966473742,
5
+ "train_runtime": 1686.4021,
6
+ "train_samples": 12338,
7
+ "train_samples_per_second": 0.699,
8
+ "train_steps_per_second": 0.087
9
  }
trainer_state.json CHANGED
@@ -1,1134 +1,259 @@
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
- "epoch": 2.994174757281553,
5
  "eval_steps": 500,
6
- "global_step": 771,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
- "epoch": 0.003883495145631068,
13
- "grad_norm": 2.65625,
14
- "learning_rate": 2.564102564102564e-06,
15
- "loss": 1.8569,
16
  "step": 1
17
  },
18
  {
19
- "epoch": 0.019417475728155338,
20
- "grad_norm": 2.609375,
21
- "learning_rate": 1.282051282051282e-05,
22
- "loss": 1.8227,
23
  "step": 5
24
  },
25
  {
26
- "epoch": 0.038834951456310676,
27
- "grad_norm": 2.09375,
28
- "learning_rate": 2.564102564102564e-05,
29
- "loss": 1.6509,
30
  "step": 10
31
  },
32
  {
33
- "epoch": 0.05825242718446602,
34
- "grad_norm": 1.21875,
35
- "learning_rate": 3.846153846153846e-05,
36
- "loss": 1.3453,
37
  "step": 15
38
  },
39
  {
40
- "epoch": 0.07766990291262135,
41
- "grad_norm": 0.94921875,
42
- "learning_rate": 5.128205128205128e-05,
43
- "loss": 1.1542,
44
  "step": 20
45
  },
46
  {
47
- "epoch": 0.0970873786407767,
48
- "grad_norm": 0.5546875,
49
- "learning_rate": 6.410256410256412e-05,
50
- "loss": 1.0378,
51
  "step": 25
52
  },
53
  {
54
- "epoch": 0.11650485436893204,
55
- "grad_norm": 0.51171875,
56
- "learning_rate": 7.692307692307693e-05,
57
- "loss": 0.9156,
58
  "step": 30
59
  },
60
  {
61
- "epoch": 0.13592233009708737,
62
- "grad_norm": 0.423828125,
63
- "learning_rate": 8.974358974358975e-05,
64
- "loss": 0.7894,
65
  "step": 35
66
  },
67
  {
68
- "epoch": 0.1553398058252427,
69
- "grad_norm": 0.333984375,
70
- "learning_rate": 0.00010256410256410256,
71
- "loss": 0.7181,
72
  "step": 40
73
  },
74
  {
75
- "epoch": 0.17475728155339806,
76
- "grad_norm": 0.353515625,
77
- "learning_rate": 0.00011538461538461538,
78
- "loss": 0.6847,
79
  "step": 45
80
  },
81
  {
82
- "epoch": 0.1941747572815534,
83
- "grad_norm": 0.392578125,
84
- "learning_rate": 0.00012820512820512823,
85
- "loss": 0.6471,
 
 
 
 
 
 
 
 
86
  "step": 50
87
  },
88
  {
89
- "epoch": 0.21359223300970873,
90
- "grad_norm": 0.345703125,
91
- "learning_rate": 0.00014102564102564104,
92
- "loss": 0.6139,
93
  "step": 55
94
  },
95
  {
96
- "epoch": 0.23300970873786409,
97
- "grad_norm": 0.3671875,
98
- "learning_rate": 0.00015384615384615385,
99
- "loss": 0.6034,
100
  "step": 60
101
  },
102
  {
103
- "epoch": 0.2524271844660194,
104
- "grad_norm": 0.376953125,
105
- "learning_rate": 0.0001666666666666667,
106
- "loss": 0.563,
107
  "step": 65
108
  },
109
  {
110
- "epoch": 0.27184466019417475,
111
- "grad_norm": 0.396484375,
112
- "learning_rate": 0.0001794871794871795,
113
- "loss": 0.5629,
114
  "step": 70
115
  },
116
  {
117
- "epoch": 0.2912621359223301,
118
- "grad_norm": 0.365234375,
119
- "learning_rate": 0.00019230769230769233,
120
- "loss": 0.5443,
121
  "step": 75
122
  },
123
  {
124
- "epoch": 0.3106796116504854,
125
- "grad_norm": 0.34375,
126
- "learning_rate": 0.0001999958898251569,
127
- "loss": 0.5345,
128
  "step": 80
129
  },
130
  {
131
- "epoch": 0.3300970873786408,
132
- "grad_norm": 0.33203125,
133
- "learning_rate": 0.00019994965423831854,
134
- "loss": 0.5196,
135
  "step": 85
136
  },
137
  {
138
- "epoch": 0.34951456310679613,
139
- "grad_norm": 0.330078125,
140
- "learning_rate": 0.00019985206917896563,
141
- "loss": 0.5243,
142
  "step": 90
143
  },
144
  {
145
- "epoch": 0.36893203883495146,
146
- "grad_norm": 0.3203125,
147
- "learning_rate": 0.00019970318478175218,
148
- "loss": 0.5218,
149
  "step": 95
150
  },
151
  {
152
- "epoch": 0.3883495145631068,
153
- "grad_norm": 0.28515625,
154
- "learning_rate": 0.00019950307753654017,
155
- "loss": 0.5253,
 
 
 
 
 
 
 
 
156
  "step": 100
157
  },
158
  {
159
- "epoch": 0.4077669902912621,
160
- "grad_norm": 0.326171875,
161
- "learning_rate": 0.00019925185024910277,
162
- "loss": 0.5084,
163
  "step": 105
164
  },
165
  {
166
- "epoch": 0.42718446601941745,
167
- "grad_norm": 0.376953125,
168
- "learning_rate": 0.00019894963198830768,
169
- "loss": 0.48,
170
  "step": 110
171
  },
172
  {
173
- "epoch": 0.44660194174757284,
174
- "grad_norm": 0.287109375,
175
- "learning_rate": 0.00019859657801980733,
176
- "loss": 0.4837,
177
  "step": 115
178
  },
179
  {
180
- "epoch": 0.46601941747572817,
181
- "grad_norm": 0.29296875,
182
- "learning_rate": 0.00019819286972627066,
183
- "loss": 0.4751,
184
  "step": 120
185
  },
186
  {
187
- "epoch": 0.4854368932038835,
188
- "grad_norm": 0.28125,
189
- "learning_rate": 0.00019773871451419736,
190
- "loss": 0.4962,
191
  "step": 125
192
  },
193
  {
194
- "epoch": 0.5048543689320388,
195
- "grad_norm": 0.30859375,
196
- "learning_rate": 0.00019723434570736181,
197
- "loss": 0.5064,
198
  "step": 130
199
  },
200
  {
201
- "epoch": 0.5242718446601942,
202
- "grad_norm": 0.283203125,
203
- "learning_rate": 0.00019668002242694238,
204
- "loss": 0.4812,
205
  "step": 135
206
  },
207
  {
208
- "epoch": 0.5436893203883495,
209
- "grad_norm": 0.3671875,
210
- "learning_rate": 0.00019607602945839698,
211
- "loss": 0.4815,
212
  "step": 140
213
  },
214
  {
215
- "epoch": 0.5631067961165048,
216
- "grad_norm": 0.267578125,
217
- "learning_rate": 0.00019542267710515368,
218
- "loss": 0.4763,
219
  "step": 145
220
  },
221
  {
222
- "epoch": 0.5825242718446602,
223
- "grad_norm": 0.287109375,
224
- "learning_rate": 0.000194720301029191,
225
- "loss": 0.4682,
226
- "step": 150
227
- },
228
- {
229
- "epoch": 0.6019417475728155,
230
- "grad_norm": 0.298828125,
231
- "learning_rate": 0.00019396926207859084,
232
- "loss": 0.4604,
233
- "step": 155
234
- },
235
- {
236
- "epoch": 0.6213592233009708,
237
- "grad_norm": 0.26953125,
238
- "learning_rate": 0.00019316994610215116,
239
- "loss": 0.4866,
240
- "step": 160
241
- },
242
- {
243
- "epoch": 0.6407766990291263,
244
- "grad_norm": 0.271484375,
245
- "learning_rate": 0.00019232276375115515,
246
- "loss": 0.4898,
247
- "step": 165
248
- },
249
- {
250
- "epoch": 0.6601941747572816,
251
- "grad_norm": 0.2578125,
252
- "learning_rate": 0.00019142815026839755,
253
- "loss": 0.4908,
254
- "step": 170
255
- },
256
- {
257
- "epoch": 0.6796116504854369,
258
- "grad_norm": 0.28515625,
259
- "learning_rate": 0.0001904865652645773,
260
- "loss": 0.4566,
261
- "step": 175
262
- },
263
- {
264
- "epoch": 0.6990291262135923,
265
- "grad_norm": 0.27734375,
266
- "learning_rate": 0.000189498492482171,
267
- "loss": 0.45,
268
- "step": 180
269
- },
270
- {
271
- "epoch": 0.7184466019417476,
272
- "grad_norm": 0.259765625,
273
- "learning_rate": 0.00018846443954690848,
274
- "loss": 0.4646,
275
- "step": 185
276
- },
277
- {
278
- "epoch": 0.7378640776699029,
279
- "grad_norm": 0.275390625,
280
- "learning_rate": 0.00018738493770697852,
281
- "loss": 0.4625,
282
- "step": 190
283
- },
284
- {
285
- "epoch": 0.7572815533980582,
286
- "grad_norm": 0.26171875,
287
- "learning_rate": 0.00018626054156009806,
288
- "loss": 0.4452,
289
- "step": 195
290
- },
291
- {
292
- "epoch": 0.7766990291262136,
293
- "grad_norm": 0.29296875,
294
- "learning_rate": 0.00018509182876858611,
295
- "loss": 0.4535,
296
- "step": 200
297
- },
298
- {
299
- "epoch": 0.7961165048543689,
300
- "grad_norm": 0.25390625,
301
- "learning_rate": 0.00018387939976258734,
302
- "loss": 0.4393,
303
- "step": 205
304
- },
305
- {
306
- "epoch": 0.8155339805825242,
307
- "grad_norm": 0.263671875,
308
- "learning_rate": 0.0001826238774315995,
309
- "loss": 0.4391,
310
- "step": 210
311
- },
312
- {
313
- "epoch": 0.8349514563106796,
314
- "grad_norm": 0.271484375,
315
- "learning_rate": 0.00018132590680446147,
316
- "loss": 0.443,
317
- "step": 215
318
- },
319
- {
320
- "epoch": 0.8543689320388349,
321
- "grad_norm": 0.279296875,
322
- "learning_rate": 0.00017998615471796775,
323
- "loss": 0.4527,
324
- "step": 220
325
- },
326
- {
327
- "epoch": 0.8737864077669902,
328
- "grad_norm": 0.265625,
329
- "learning_rate": 0.00017860530947427875,
330
- "loss": 0.4416,
331
- "step": 225
332
- },
333
- {
334
- "epoch": 0.8932038834951457,
335
- "grad_norm": 0.2890625,
336
- "learning_rate": 0.00017718408048730317,
337
- "loss": 0.4506,
338
- "step": 230
339
- },
340
- {
341
- "epoch": 0.912621359223301,
342
- "grad_norm": 0.298828125,
343
- "learning_rate": 0.00017572319791823424,
344
- "loss": 0.4538,
345
- "step": 235
346
- },
347
- {
348
- "epoch": 0.9320388349514563,
349
- "grad_norm": 0.263671875,
350
- "learning_rate": 0.000174223412300427,
351
- "loss": 0.4467,
352
- "step": 240
353
- },
354
- {
355
- "epoch": 0.9514563106796117,
356
- "grad_norm": 0.26953125,
357
- "learning_rate": 0.00017268549415380916,
358
- "loss": 0.4425,
359
- "step": 245
360
- },
361
- {
362
- "epoch": 0.970873786407767,
363
- "grad_norm": 0.267578125,
364
- "learning_rate": 0.00017111023358902392,
365
- "loss": 0.4389,
366
- "step": 250
367
- },
368
- {
369
- "epoch": 0.9902912621359223,
370
- "grad_norm": 0.251953125,
371
- "learning_rate": 0.00016949843990150796,
372
- "loss": 0.4177,
373
- "step": 255
374
- },
375
- {
376
- "epoch": 0.9980582524271845,
377
- "eval_loss": 0.4281997084617615,
378
- "eval_runtime": 22.0587,
379
- "eval_samples_per_second": 4.987,
380
- "eval_steps_per_second": 0.635,
381
- "step": 257
382
- },
383
- {
384
- "epoch": 1.0097087378640777,
385
- "grad_norm": 0.236328125,
386
- "learning_rate": 0.00016785094115571322,
387
- "loss": 0.4083,
388
- "step": 260
389
- },
390
- {
391
- "epoch": 1.029126213592233,
392
- "grad_norm": 0.275390625,
393
- "learning_rate": 0.00016616858375968595,
394
- "loss": 0.4012,
395
- "step": 265
396
- },
397
- {
398
- "epoch": 1.0485436893203883,
399
- "grad_norm": 0.28515625,
400
- "learning_rate": 0.00016445223203022166,
401
- "loss": 0.3935,
402
- "step": 270
403
- },
404
- {
405
- "epoch": 1.0679611650485437,
406
- "grad_norm": 0.279296875,
407
- "learning_rate": 0.00016270276774881954,
408
- "loss": 0.3972,
409
- "step": 275
410
- },
411
- {
412
- "epoch": 1.087378640776699,
413
- "grad_norm": 0.267578125,
414
- "learning_rate": 0.00016092108970866423,
415
- "loss": 0.4004,
416
- "step": 280
417
- },
418
- {
419
- "epoch": 1.1067961165048543,
420
- "grad_norm": 0.279296875,
421
- "learning_rate": 0.00015910811325286768,
422
- "loss": 0.3941,
423
- "step": 285
424
- },
425
- {
426
- "epoch": 1.1262135922330097,
427
- "grad_norm": 0.271484375,
428
- "learning_rate": 0.00015726476980420864,
429
- "loss": 0.3877,
430
- "step": 290
431
- },
432
- {
433
- "epoch": 1.145631067961165,
434
- "grad_norm": 0.287109375,
435
- "learning_rate": 0.00015539200638661104,
436
- "loss": 0.3819,
437
- "step": 295
438
- },
439
- {
440
- "epoch": 1.1650485436893203,
441
- "grad_norm": 0.291015625,
442
- "learning_rate": 0.00015349078513860726,
443
- "loss": 0.3816,
444
- "step": 300
445
- },
446
- {
447
- "epoch": 1.1844660194174756,
448
- "grad_norm": 0.318359375,
449
- "learning_rate": 0.00015156208281903613,
450
- "loss": 0.3953,
451
- "step": 305
452
- },
453
- {
454
- "epoch": 1.203883495145631,
455
- "grad_norm": 0.296875,
456
- "learning_rate": 0.0001496068903052299,
457
- "loss": 0.3905,
458
- "step": 310
459
- },
460
- {
461
- "epoch": 1.2233009708737863,
462
- "grad_norm": 0.2734375,
463
- "learning_rate": 0.0001476262120839475,
464
- "loss": 0.3763,
465
- "step": 315
466
- },
467
- {
468
- "epoch": 1.2427184466019416,
469
- "grad_norm": 0.3125,
470
- "learning_rate": 0.0001456210657353163,
471
- "loss": 0.3792,
472
- "step": 320
473
- },
474
- {
475
- "epoch": 1.262135922330097,
476
- "grad_norm": 0.29296875,
477
- "learning_rate": 0.00014359248141004668,
478
- "loss": 0.3794,
479
- "step": 325
480
- },
481
- {
482
- "epoch": 1.2815533980582523,
483
- "grad_norm": 0.294921875,
484
- "learning_rate": 0.00014154150130018866,
485
- "loss": 0.3756,
486
- "step": 330
487
- },
488
- {
489
- "epoch": 1.3009708737864076,
490
- "grad_norm": 0.287109375,
491
- "learning_rate": 0.00013946917910370233,
492
- "loss": 0.3876,
493
- "step": 335
494
- },
495
- {
496
- "epoch": 1.3203883495145632,
497
- "grad_norm": 0.306640625,
498
- "learning_rate": 0.00013737657948311683,
499
- "loss": 0.3819,
500
- "step": 340
501
- },
502
- {
503
- "epoch": 1.3398058252427185,
504
- "grad_norm": 0.26171875,
505
- "learning_rate": 0.00013526477751855644,
506
- "loss": 0.3925,
507
- "step": 345
508
- },
509
- {
510
- "epoch": 1.3592233009708738,
511
- "grad_norm": 0.287109375,
512
- "learning_rate": 0.00013313485815541454,
513
- "loss": 0.3915,
514
- "step": 350
515
- },
516
- {
517
- "epoch": 1.3786407766990292,
518
- "grad_norm": 0.28515625,
519
- "learning_rate": 0.00013098791564695927,
520
- "loss": 0.3902,
521
- "step": 355
522
- },
523
- {
524
- "epoch": 1.3980582524271845,
525
- "grad_norm": 0.2890625,
526
- "learning_rate": 0.0001288250529921571,
527
- "loss": 0.3864,
528
- "step": 360
529
- },
530
- {
531
- "epoch": 1.4174757281553398,
532
- "grad_norm": 0.27734375,
533
- "learning_rate": 0.00012664738136900348,
534
- "loss": 0.3778,
535
- "step": 365
536
- },
537
- {
538
- "epoch": 1.4368932038834952,
539
- "grad_norm": 0.2890625,
540
- "learning_rate": 0.0001244560195636515,
541
- "loss": 0.3732,
542
- "step": 370
543
- },
544
- {
545
- "epoch": 1.4563106796116505,
546
- "grad_norm": 0.318359375,
547
- "learning_rate": 0.00012225209339563145,
548
- "loss": 0.3856,
549
- "step": 375
550
- },
551
- {
552
- "epoch": 1.4757281553398058,
553
- "grad_norm": 0.265625,
554
- "learning_rate": 0.00012003673513945746,
555
- "loss": 0.3745,
556
- "step": 380
557
- },
558
- {
559
- "epoch": 1.4951456310679612,
560
- "grad_norm": 0.306640625,
561
- "learning_rate": 0.0001178110829429175,
562
- "loss": 0.3657,
563
- "step": 385
564
- },
565
- {
566
- "epoch": 1.5145631067961165,
567
- "grad_norm": 0.25390625,
568
- "learning_rate": 0.0001155762802423463,
569
- "loss": 0.3728,
570
- "step": 390
571
- },
572
- {
573
- "epoch": 1.5339805825242718,
574
- "grad_norm": 0.265625,
575
- "learning_rate": 0.0001133334751751809,
576
- "loss": 0.3849,
577
- "step": 395
578
- },
579
- {
580
- "epoch": 1.5533980582524272,
581
- "grad_norm": 0.291015625,
582
- "learning_rate": 0.00011108381999010111,
583
- "loss": 0.3979,
584
- "step": 400
585
- },
586
- {
587
- "epoch": 1.5728155339805825,
588
- "grad_norm": 0.26171875,
589
- "learning_rate": 0.00010882847045505808,
590
- "loss": 0.3819,
591
- "step": 405
592
- },
593
- {
594
- "epoch": 1.5922330097087378,
595
- "grad_norm": 0.2578125,
596
- "learning_rate": 0.00010656858526349449,
597
- "loss": 0.3704,
598
- "step": 410
599
- },
600
- {
601
- "epoch": 1.6116504854368932,
602
- "grad_norm": 0.26171875,
603
- "learning_rate": 0.00010430532543906179,
604
- "loss": 0.3702,
605
- "step": 415
606
- },
607
- {
608
- "epoch": 1.6310679611650487,
609
- "grad_norm": 0.283203125,
610
- "learning_rate": 0.00010203985373914056,
611
- "loss": 0.373,
612
- "step": 420
613
- },
614
- {
615
- "epoch": 1.650485436893204,
616
- "grad_norm": 0.279296875,
617
- "learning_rate": 9.977333405746979e-05,
618
- "loss": 0.366,
619
- "step": 425
620
- },
621
- {
622
- "epoch": 1.6699029126213594,
623
- "grad_norm": 0.30859375,
624
- "learning_rate": 9.750693082619273e-05,
625
- "loss": 0.3555,
626
- "step": 430
627
- },
628
- {
629
- "epoch": 1.6893203883495147,
630
- "grad_norm": 0.279296875,
631
- "learning_rate": 9.524180841762577e-05,
632
- "loss": 0.3793,
633
- "step": 435
634
- },
635
- {
636
- "epoch": 1.70873786407767,
637
- "grad_norm": 0.25390625,
638
- "learning_rate": 9.297913054605838e-05,
639
- "loss": 0.3692,
640
- "step": 440
641
- },
642
- {
643
- "epoch": 1.7281553398058254,
644
- "grad_norm": 0.27734375,
645
- "learning_rate": 9.072005966989084e-05,
646
- "loss": 0.3655,
647
- "step": 445
648
- },
649
- {
650
- "epoch": 1.7475728155339807,
651
- "grad_norm": 0.265625,
652
- "learning_rate": 8.846575639441732e-05,
653
- "loss": 0.3791,
654
- "step": 450
655
- },
656
- {
657
- "epoch": 1.766990291262136,
658
- "grad_norm": 0.29296875,
659
- "learning_rate": 8.621737887556114e-05,
660
- "loss": 0.3661,
661
- "step": 455
662
- },
663
- {
664
- "epoch": 1.7864077669902914,
665
- "grad_norm": 0.2890625,
666
- "learning_rate": 8.397608222486805e-05,
667
- "loss": 0.3611,
668
- "step": 460
669
- },
670
- {
671
- "epoch": 1.8058252427184467,
672
- "grad_norm": 0.291015625,
673
- "learning_rate": 8.174301791606385e-05,
674
- "loss": 0.3611,
675
- "step": 465
676
- },
677
- {
678
- "epoch": 1.825242718446602,
679
- "grad_norm": 0.279296875,
680
- "learning_rate": 7.951933319348095e-05,
681
- "loss": 0.3493,
682
- "step": 470
683
- },
684
- {
685
- "epoch": 1.8446601941747574,
686
- "grad_norm": 0.2578125,
687
- "learning_rate": 7.730617048265761e-05,
688
- "loss": 0.3655,
689
- "step": 475
690
- },
691
- {
692
- "epoch": 1.8640776699029127,
693
- "grad_norm": 0.318359375,
694
- "learning_rate": 7.510466680341301e-05,
695
- "loss": 0.3527,
696
- "step": 480
697
- },
698
- {
699
- "epoch": 1.883495145631068,
700
- "grad_norm": 0.298828125,
701
- "learning_rate": 7.291595318569951e-05,
702
- "loss": 0.3528,
703
- "step": 485
704
- },
705
- {
706
- "epoch": 1.9029126213592233,
707
- "grad_norm": 0.30078125,
708
- "learning_rate": 7.074115408853203e-05,
709
- "loss": 0.3784,
710
- "step": 490
711
- },
712
- {
713
- "epoch": 1.9223300970873787,
714
- "grad_norm": 0.27734375,
715
- "learning_rate": 6.858138682229376e-05,
716
- "loss": 0.3571,
717
- "step": 495
718
- },
719
- {
720
- "epoch": 1.941747572815534,
721
- "grad_norm": 0.265625,
722
- "learning_rate": 6.643776097471377e-05,
723
- "loss": 0.3658,
724
- "step": 500
725
- },
726
- {
727
- "epoch": 1.9611650485436893,
728
- "grad_norm": 0.275390625,
729
- "learning_rate": 6.431137784081282e-05,
730
- "loss": 0.3567,
731
- "step": 505
732
- },
733
- {
734
- "epoch": 1.9805825242718447,
735
- "grad_norm": 0.29296875,
736
- "learning_rate": 6.220332985710936e-05,
737
- "loss": 0.3526,
738
- "step": 510
739
- },
740
- {
741
- "epoch": 2.0,
742
- "grad_norm": 0.30078125,
743
- "learning_rate": 6.011470004037636e-05,
744
- "loss": 0.361,
745
- "step": 515
746
- },
747
- {
748
- "epoch": 2.0,
749
- "eval_loss": 0.38420218229293823,
750
- "eval_runtime": 22.0532,
751
- "eval_samples_per_second": 4.988,
752
- "eval_steps_per_second": 0.635,
753
- "step": 515
754
- },
755
- {
756
- "epoch": 2.0194174757281553,
757
- "grad_norm": 0.2421875,
758
- "learning_rate": 5.804656143123801e-05,
759
- "loss": 0.308,
760
- "step": 520
761
- },
762
- {
763
- "epoch": 2.0388349514563107,
764
- "grad_norm": 0.283203125,
765
- "learning_rate": 5.599997654289129e-05,
766
- "loss": 0.3136,
767
- "step": 525
768
- },
769
- {
770
- "epoch": 2.058252427184466,
771
- "grad_norm": 0.3046875,
772
- "learning_rate": 5.397599681523643e-05,
773
- "loss": 0.3037,
774
- "step": 530
775
- },
776
- {
777
- "epoch": 2.0776699029126213,
778
- "grad_norm": 0.298828125,
779
- "learning_rate": 5.1975662074695865e-05,
780
- "loss": 0.3098,
781
- "step": 535
782
- },
783
- {
784
- "epoch": 2.0970873786407767,
785
- "grad_norm": 0.28125,
786
- "learning_rate": 5.000000000000002e-05,
787
- "loss": 0.3111,
788
- "step": 540
789
- },
790
- {
791
- "epoch": 2.116504854368932,
792
- "grad_norm": 0.294921875,
793
- "learning_rate": 4.8050025594214e-05,
794
- "loss": 0.3083,
795
- "step": 545
796
- },
797
- {
798
- "epoch": 2.1359223300970873,
799
- "grad_norm": 0.29296875,
800
- "learning_rate": 4.6126740663276166e-05,
801
- "loss": 0.3133,
802
- "step": 550
803
- },
804
- {
805
- "epoch": 2.1553398058252426,
806
- "grad_norm": 0.29296875,
807
- "learning_rate": 4.423113330131707e-05,
808
- "loss": 0.3076,
809
- "step": 555
810
- },
811
- {
812
- "epoch": 2.174757281553398,
813
- "grad_norm": 0.29296875,
814
- "learning_rate": 4.236417738302257e-05,
815
- "loss": 0.2981,
816
- "step": 560
817
- },
818
- {
819
- "epoch": 2.1941747572815533,
820
- "grad_norm": 0.314453125,
821
- "learning_rate": 4.052683206330267e-05,
822
- "loss": 0.312,
823
- "step": 565
824
- },
825
- {
826
- "epoch": 2.2135922330097086,
827
- "grad_norm": 0.298828125,
828
- "learning_rate": 3.872004128452231e-05,
829
- "loss": 0.2942,
830
- "step": 570
831
- },
832
- {
833
- "epoch": 2.233009708737864,
834
- "grad_norm": 0.314453125,
835
- "learning_rate": 3.694473329154778e-05,
836
- "loss": 0.3205,
837
- "step": 575
838
- },
839
- {
840
- "epoch": 2.2524271844660193,
841
- "grad_norm": 0.3359375,
842
- "learning_rate": 3.5201820154857755e-05,
843
- "loss": 0.322,
844
- "step": 580
845
- },
846
- {
847
- "epoch": 2.2718446601941746,
848
- "grad_norm": 0.296875,
849
- "learning_rate": 3.3492197301964145e-05,
850
- "loss": 0.2931,
851
- "step": 585
852
- },
853
- {
854
- "epoch": 2.29126213592233,
855
- "grad_norm": 0.33203125,
856
- "learning_rate": 3.18167430573831e-05,
857
- "loss": 0.3107,
858
- "step": 590
859
- },
860
- {
861
- "epoch": 2.3106796116504853,
862
- "grad_norm": 0.302734375,
863
- "learning_rate": 3.0176318191392726e-05,
864
- "loss": 0.3065,
865
- "step": 595
866
- },
867
- {
868
- "epoch": 2.3300970873786406,
869
- "grad_norm": 0.3203125,
870
- "learning_rate": 2.8571765477809643e-05,
871
- "loss": 0.3031,
872
- "step": 600
873
- },
874
- {
875
- "epoch": 2.349514563106796,
876
- "grad_norm": 0.3125,
877
- "learning_rate": 2.7003909261010928e-05,
878
- "loss": 0.2894,
879
- "step": 605
880
- },
881
- {
882
- "epoch": 2.3689320388349513,
883
- "grad_norm": 0.314453125,
884
- "learning_rate": 2.5473555032424533e-05,
885
- "loss": 0.3136,
886
- "step": 610
887
- },
888
- {
889
- "epoch": 2.3883495145631066,
890
- "grad_norm": 0.33984375,
891
- "learning_rate": 2.3981489016705205e-05,
892
- "loss": 0.3078,
893
- "step": 615
894
- },
895
- {
896
- "epoch": 2.407766990291262,
897
- "grad_norm": 0.322265625,
898
- "learning_rate": 2.2528477767808963e-05,
899
- "loss": 0.3094,
900
- "step": 620
901
- },
902
- {
903
- "epoch": 2.4271844660194173,
904
- "grad_norm": 0.31640625,
905
- "learning_rate": 2.1115267775173532e-05,
906
- "loss": 0.2985,
907
- "step": 625
908
- },
909
- {
910
- "epoch": 2.4466019417475726,
911
- "grad_norm": 0.322265625,
912
- "learning_rate": 1.9742585080206755e-05,
913
- "loss": 0.3191,
914
- "step": 630
915
- },
916
- {
917
- "epoch": 2.466019417475728,
918
- "grad_norm": 0.3203125,
919
- "learning_rate": 1.8411134903280303e-05,
920
- "loss": 0.3117,
921
- "step": 635
922
- },
923
- {
924
- "epoch": 2.4854368932038833,
925
- "grad_norm": 0.306640625,
926
- "learning_rate": 1.7121601281420495e-05,
927
- "loss": 0.3086,
928
- "step": 640
929
- },
930
- {
931
- "epoch": 2.5048543689320386,
932
- "grad_norm": 0.33203125,
933
- "learning_rate": 1.587464671688187e-05,
934
- "loss": 0.3072,
935
- "step": 645
936
- },
937
- {
938
- "epoch": 2.524271844660194,
939
- "grad_norm": 0.306640625,
940
- "learning_rate": 1.467091183678444e-05,
941
- "loss": 0.3044,
942
- "step": 650
943
- },
944
- {
945
- "epoch": 2.5436893203883493,
946
- "grad_norm": 0.298828125,
947
- "learning_rate": 1.3511015063989274e-05,
948
- "loss": 0.3081,
949
- "step": 655
950
- },
951
- {
952
- "epoch": 2.5631067961165046,
953
- "grad_norm": 0.275390625,
954
- "learning_rate": 1.2395552299381741e-05,
955
- "loss": 0.3005,
956
- "step": 660
957
- },
958
- {
959
- "epoch": 2.58252427184466,
960
- "grad_norm": 0.3125,
961
- "learning_rate": 1.1325096615725427e-05,
962
- "loss": 0.3034,
963
- "step": 665
964
- },
965
- {
966
- "epoch": 2.6019417475728153,
967
- "grad_norm": 0.3125,
968
- "learning_rate": 1.030019796324404e-05,
969
- "loss": 0.3075,
970
- "step": 670
971
- },
972
- {
973
- "epoch": 2.6213592233009706,
974
- "grad_norm": 0.3125,
975
- "learning_rate": 9.321382887082563e-06,
976
- "loss": 0.3084,
977
- "step": 675
978
- },
979
- {
980
- "epoch": 2.6407766990291264,
981
- "grad_norm": 0.3203125,
982
- "learning_rate": 8.38915425679304e-06,
983
- "loss": 0.3064,
984
- "step": 680
985
- },
986
- {
987
- "epoch": 2.6601941747572817,
988
- "grad_norm": 0.310546875,
989
- "learning_rate": 7.503991007983524e-06,
990
- "loss": 0.3087,
991
- "step": 685
992
- },
993
- {
994
- "epoch": 2.679611650485437,
995
- "grad_norm": 0.3359375,
996
- "learning_rate": 6.666347896263325e-06,
997
- "loss": 0.3129,
998
- "step": 690
999
- },
1000
- {
1001
- "epoch": 2.6990291262135924,
1002
- "grad_norm": 0.291015625,
1003
- "learning_rate": 5.876655263610842e-06,
1004
- "loss": 0.2897,
1005
- "step": 695
1006
- },
1007
- {
1008
- "epoch": 2.7184466019417477,
1009
- "grad_norm": 0.31640625,
1010
- "learning_rate": 5.1353188172838074e-06,
1011
- "loss": 0.3033,
1012
- "step": 700
1013
- },
1014
- {
1015
- "epoch": 2.737864077669903,
1016
- "grad_norm": 0.30859375,
1017
- "learning_rate": 4.442719421385922e-06,
1018
- "loss": 0.2994,
1019
- "step": 705
1020
- },
1021
- {
1022
- "epoch": 2.7572815533980584,
1023
- "grad_norm": 0.330078125,
1024
- "learning_rate": 3.7992129011965803e-06,
1025
- "loss": 0.3152,
1026
- "step": 710
1027
- },
1028
- {
1029
- "epoch": 2.7766990291262137,
1030
- "grad_norm": 0.314453125,
1031
- "learning_rate": 3.2051298603643753e-06,
1032
- "loss": 0.3101,
1033
- "step": 715
1034
- },
1035
- {
1036
- "epoch": 2.796116504854369,
1037
- "grad_norm": 0.30859375,
1038
- "learning_rate": 2.6607755110584887e-06,
1039
- "loss": 0.3125,
1040
- "step": 720
1041
- },
1042
- {
1043
- "epoch": 2.8155339805825244,
1044
- "grad_norm": 0.291015625,
1045
- "learning_rate": 2.1664295171648364e-06,
1046
- "loss": 0.3024,
1047
- "step": 725
1048
- },
1049
- {
1050
- "epoch": 2.8349514563106797,
1051
- "grad_norm": 0.31640625,
1052
- "learning_rate": 1.7223458506077316e-06,
1053
- "loss": 0.2913,
1054
- "step": 730
1055
- },
1056
- {
1057
- "epoch": 2.854368932038835,
1058
- "grad_norm": 0.3046875,
1059
- "learning_rate": 1.3287526608711131e-06,
1060
- "loss": 0.2968,
1061
- "step": 735
1062
- },
1063
- {
1064
- "epoch": 2.8737864077669903,
1065
- "grad_norm": 0.3203125,
1066
- "learning_rate": 9.85852157785816e-07,
1067
- "loss": 0.3033,
1068
- "step": 740
1069
- },
1070
- {
1071
- "epoch": 2.8932038834951457,
1072
- "grad_norm": 0.310546875,
1073
- "learning_rate": 6.938205076436832e-07,
1074
- "loss": 0.2991,
1075
- "step": 745
1076
- },
1077
- {
1078
- "epoch": 2.912621359223301,
1079
- "grad_norm": 0.328125,
1080
- "learning_rate": 4.5280774269154115e-07,
1081
- "loss": 0.3003,
1082
- "step": 750
1083
- },
1084
- {
1085
- "epoch": 2.9320388349514563,
1086
- "grad_norm": 0.2890625,
1087
- "learning_rate": 2.629376840515452e-07,
1088
- "loss": 0.2999,
1089
- "step": 755
1090
- },
1091
- {
1092
- "epoch": 2.9514563106796117,
1093
- "grad_norm": 0.32421875,
1094
- "learning_rate": 1.2430787810776555e-07,
1095
- "loss": 0.3023,
1096
- "step": 760
1097
- },
1098
- {
1099
- "epoch": 2.970873786407767,
1100
- "grad_norm": 0.326171875,
1101
- "learning_rate": 3.6989546391297256e-08,
1102
- "loss": 0.3106,
1103
- "step": 765
1104
- },
1105
- {
1106
- "epoch": 2.9902912621359223,
1107
- "grad_norm": 0.318359375,
1108
- "learning_rate": 1.0275489900624102e-09,
1109
- "loss": 0.3036,
1110
- "step": 770
1111
- },
1112
- {
1113
- "epoch": 2.994174757281553,
1114
- "eval_loss": 0.38681623339653015,
1115
- "eval_runtime": 22.0402,
1116
- "eval_samples_per_second": 4.991,
1117
- "eval_steps_per_second": 0.635,
1118
- "step": 771
1119
  },
1120
  {
1121
- "epoch": 2.994174757281553,
1122
- "step": 771,
1123
- "total_flos": 5.421128640186286e+17,
1124
- "train_loss": 0.2601610358741651,
1125
- "train_runtime": 3322.581,
1126
- "train_samples_per_second": 1.86,
1127
- "train_steps_per_second": 0.232
1128
  }
1129
  ],
1130
  "logging_steps": 5,
1131
- "max_steps": 771,
1132
  "num_input_tokens_seen": 0,
1133
  "num_train_epochs": 3,
1134
  "save_steps": 100,
@@ -1144,7 +269,7 @@
1144
  "attributes": {}
1145
  }
1146
  },
1147
- "total_flos": 5.421128640186286e+17,
1148
  "train_batch_size": 4,
1149
  "trial_name": null,
1150
  "trial_params": null
 
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
+ "epoch": 2.9696969696969697,
5
  "eval_steps": 500,
6
+ "global_step": 147,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
+ "epoch": 0.020202020202020204,
13
+ "grad_norm": 2.578125,
14
+ "learning_rate": 1.3333333333333334e-06,
15
+ "loss": 1.5448,
16
  "step": 1
17
  },
18
  {
19
+ "epoch": 0.10101010101010101,
20
+ "grad_norm": 2.546875,
21
+ "learning_rate": 6.666666666666667e-06,
22
+ "loss": 1.5503,
23
  "step": 5
24
  },
25
  {
26
+ "epoch": 0.20202020202020202,
27
+ "grad_norm": 2.390625,
28
+ "learning_rate": 1.3333333333333333e-05,
29
+ "loss": 1.469,
30
  "step": 10
31
  },
32
  {
33
+ "epoch": 0.30303030303030304,
34
+ "grad_norm": 1.65625,
35
+ "learning_rate": 2e-05,
36
+ "loss": 1.2465,
37
  "step": 15
38
  },
39
  {
40
+ "epoch": 0.40404040404040403,
41
+ "grad_norm": 1.203125,
42
+ "learning_rate": 1.9929278846732883e-05,
43
+ "loss": 1.0305,
44
  "step": 20
45
  },
46
  {
47
+ "epoch": 0.5050505050505051,
48
+ "grad_norm": 0.953125,
49
+ "learning_rate": 1.9718115683235418e-05,
50
+ "loss": 0.9205,
51
  "step": 25
52
  },
53
  {
54
+ "epoch": 0.6060606060606061,
55
+ "grad_norm": 0.609375,
56
+ "learning_rate": 1.936949724999762e-05,
57
+ "loss": 0.8521,
58
  "step": 30
59
  },
60
  {
61
+ "epoch": 0.7070707070707071,
62
+ "grad_norm": 0.48046875,
63
+ "learning_rate": 1.8888354486549238e-05,
64
+ "loss": 0.8354,
65
  "step": 35
66
  },
67
  {
68
+ "epoch": 0.8080808080808081,
69
+ "grad_norm": 0.478515625,
70
+ "learning_rate": 1.8281492787113707e-05,
71
+ "loss": 0.7916,
72
  "step": 40
73
  },
74
  {
75
+ "epoch": 0.9090909090909091,
76
+ "grad_norm": 0.46484375,
77
+ "learning_rate": 1.7557495743542586e-05,
78
+ "loss": 0.7588,
79
  "step": 45
80
  },
81
  {
82
+ "epoch": 0.98989898989899,
83
+ "eval_loss": 0.7386298179626465,
84
+ "eval_runtime": 17.3037,
85
+ "eval_samples_per_second": 2.485,
86
+ "eval_steps_per_second": 0.347,
87
+ "step": 49
88
+ },
89
+ {
90
+ "epoch": 1.0101010101010102,
91
+ "grad_norm": 0.5390625,
92
+ "learning_rate": 1.6726603737012527e-05,
93
+ "loss": 0.7461,
94
  "step": 50
95
  },
96
  {
97
+ "epoch": 1.1111111111111112,
98
+ "grad_norm": 0.5078125,
99
+ "learning_rate": 1.5800569095711983e-05,
100
+ "loss": 0.715,
101
  "step": 55
102
  },
103
  {
104
+ "epoch": 1.2121212121212122,
105
+ "grad_norm": 0.3046875,
106
+ "learning_rate": 1.479248986720057e-05,
107
+ "loss": 0.6996,
108
  "step": 60
109
  },
110
  {
111
+ "epoch": 1.3131313131313131,
112
+ "grad_norm": 0.28125,
113
+ "learning_rate": 1.3716624556603275e-05,
114
+ "loss": 0.6809,
115
  "step": 65
116
  },
117
  {
118
+ "epoch": 1.4141414141414141,
119
+ "grad_norm": 0.318359375,
120
+ "learning_rate": 1.2588190451025209e-05,
121
+ "loss": 0.6585,
122
  "step": 70
123
  },
124
  {
125
+ "epoch": 1.5151515151515151,
126
+ "grad_norm": 0.2255859375,
127
+ "learning_rate": 1.1423148382732854e-05,
128
+ "loss": 0.6599,
129
  "step": 75
130
  },
131
  {
132
+ "epoch": 1.6161616161616161,
133
+ "grad_norm": 0.2431640625,
134
+ "learning_rate": 1.0237976975461074e-05,
135
+ "loss": 0.646,
136
  "step": 80
137
  },
138
  {
139
+ "epoch": 1.7171717171717171,
140
+ "grad_norm": 0.203125,
141
+ "learning_rate": 9.049439566958176e-06,
142
+ "loss": 0.6349,
143
  "step": 85
144
  },
145
  {
146
+ "epoch": 1.8181818181818183,
147
+ "grad_norm": 0.2392578125,
148
+ "learning_rate": 7.874347104470234e-06,
149
+ "loss": 0.6352,
150
  "step": 90
151
  },
152
  {
153
+ "epoch": 1.9191919191919191,
154
+ "grad_norm": 0.2060546875,
155
+ "learning_rate": 6.729320366825785e-06,
156
+ "loss": 0.6201,
157
  "step": 95
158
  },
159
  {
160
+ "epoch": 2.0,
161
+ "eval_loss": 0.6306869983673096,
162
+ "eval_runtime": 17.31,
163
+ "eval_samples_per_second": 2.484,
164
+ "eval_steps_per_second": 0.347,
165
+ "step": 99
166
+ },
167
+ {
168
+ "epoch": 2.0202020202020203,
169
+ "grad_norm": 0.2001953125,
170
+ "learning_rate": 5.630554876306407e-06,
171
+ "loss": 0.6233,
172
  "step": 100
173
  },
174
  {
175
+ "epoch": 2.121212121212121,
176
+ "grad_norm": 0.2021484375,
177
+ "learning_rate": 4.593591825444028e-06,
178
+ "loss": 0.6138,
179
  "step": 105
180
  },
181
  {
182
+ "epoch": 2.2222222222222223,
183
+ "grad_norm": 0.2275390625,
184
+ "learning_rate": 3.633098258809119e-06,
185
+ "loss": 0.6218,
186
  "step": 110
187
  },
188
  {
189
+ "epoch": 2.323232323232323,
190
+ "grad_norm": 0.2021484375,
191
+ "learning_rate": 2.7626596189492983e-06,
192
+ "loss": 0.6201,
193
  "step": 115
194
  },
195
  {
196
+ "epoch": 2.4242424242424243,
197
+ "grad_norm": 0.19921875,
198
+ "learning_rate": 1.994587590756397e-06,
199
+ "loss": 0.6164,
200
  "step": 120
201
  },
202
  {
203
+ "epoch": 2.525252525252525,
204
+ "grad_norm": 0.2041015625,
205
+ "learning_rate": 1.339745962155613e-06,
206
+ "loss": 0.6157,
207
  "step": 125
208
  },
209
  {
210
+ "epoch": 2.6262626262626263,
211
+ "grad_norm": 0.216796875,
212
+ "learning_rate": 8.073969641833446e-07,
213
+ "loss": 0.6151,
214
  "step": 130
215
  },
216
  {
217
+ "epoch": 2.7272727272727275,
218
+ "grad_norm": 0.267578125,
219
+ "learning_rate": 4.0507026385502747e-07,
220
+ "loss": 0.6097,
221
  "step": 135
222
  },
223
  {
224
+ "epoch": 2.8282828282828283,
225
+ "grad_norm": 0.208984375,
226
+ "learning_rate": 1.3845646281813508e-07,
227
+ "loss": 0.617,
228
  "step": 140
229
  },
230
  {
231
+ "epoch": 2.929292929292929,
232
+ "grad_norm": 0.1943359375,
233
+ "learning_rate": 1.1326608169920373e-08,
234
+ "loss": 0.6087,
235
  "step": 145
236
  },
237
  {
238
+ "epoch": 2.9696969696969697,
239
+ "eval_loss": 0.6245681643486023,
240
+ "eval_runtime": 17.3067,
241
+ "eval_samples_per_second": 2.485,
242
+ "eval_steps_per_second": 0.347,
243
+ "step": 147
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
244
  },
245
  {
246
+ "epoch": 2.9696969696969697,
247
+ "step": 147,
248
+ "total_flos": 2.0566538541072384e+17,
249
+ "train_loss": 0.7671454966473742,
250
+ "train_runtime": 1686.4021,
251
+ "train_samples_per_second": 0.699,
252
+ "train_steps_per_second": 0.087
253
  }
254
  ],
255
  "logging_steps": 5,
256
+ "max_steps": 147,
257
  "num_input_tokens_seen": 0,
258
  "num_train_epochs": 3,
259
  "save_steps": 100,
 
269
  "attributes": {}
270
  }
271
  },
272
+ "total_flos": 2.0566538541072384e+17,
273
  "train_batch_size": 4,
274
  "trial_name": null,
275
  "trial_params": null