chansung commited on
Commit
600f26e
1 Parent(s): aaddb93

Model save

Browse files
README.md CHANGED
@@ -2,13 +2,12 @@
2
  license: gemma
3
  library_name: peft
4
  tags:
5
- - alignment-handbook
6
  - trl
7
  - sft
8
  - generated_from_trainer
9
  base_model: google/gemma-2b
10
  datasets:
11
- - llama-duo/synth_summarize_dataset_dedup
12
  model-index:
13
  - name: gemma2b-summarize-gpt4o-64k
14
  results: []
@@ -19,9 +18,9 @@ should probably proofread and complete it, then remove this comment. -->
19
 
20
  # gemma2b-summarize-gpt4o-64k
21
 
22
- This model is a fine-tuned version of [google/gemma-2b](https://huggingface.co/google/gemma-2b) on the llama-duo/synth_summarize_dataset_dedup dataset.
23
  It achieves the following results on the evaluation set:
24
- - Loss: 2.5990
25
 
26
  ## Model description
27
 
@@ -41,38 +40,33 @@ More information needed
41
 
42
  The following hyperparameters were used during training:
43
  - learning_rate: 0.0002
44
- - train_batch_size: 8
45
- - eval_batch_size: 8
46
  - seed: 42
47
  - distributed_type: multi-GPU
48
  - num_devices: 3
49
  - gradient_accumulation_steps: 2
50
- - total_train_batch_size: 48
51
- - total_eval_batch_size: 24
52
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
53
  - lr_scheduler_type: cosine
54
  - lr_scheduler_warmup_ratio: 0.1
55
- - num_epochs: 15
56
 
57
  ### Training results
58
 
59
  | Training Loss | Epoch | Step | Validation Loss |
60
  |:-------------:|:-----:|:----:|:---------------:|
61
- | 1.2959 | 1.0 | 146 | 2.5295 |
62
- | 1.1524 | 2.0 | 292 | 2.4913 |
63
- | 1.1138 | 3.0 | 438 | 2.4847 |
64
- | 1.0703 | 4.0 | 584 | 2.4927 |
65
- | 1.0423 | 5.0 | 730 | 2.5080 |
66
- | 1.0322 | 6.0 | 876 | 2.5202 |
67
- | 1.0113 | 7.0 | 1022 | 2.5385 |
68
- | 0.9857 | 8.0 | 1168 | 2.5522 |
69
- | 0.9865 | 9.0 | 1314 | 2.5657 |
70
- | 0.9691 | 10.0 | 1460 | 2.5774 |
71
- | 0.952 | 11.0 | 1606 | 2.5889 |
72
- | 0.97 | 12.0 | 1752 | 2.5957 |
73
- | 0.9514 | 13.0 | 1898 | 2.5988 |
74
- | 0.9469 | 14.0 | 2044 | 2.5997 |
75
- | 0.9469 | 15.0 | 2190 | 2.5990 |
76
 
77
 
78
  ### Framework versions
 
2
  license: gemma
3
  library_name: peft
4
  tags:
 
5
  - trl
6
  - sft
7
  - generated_from_trainer
8
  base_model: google/gemma-2b
9
  datasets:
10
+ - generator
11
  model-index:
12
  - name: gemma2b-summarize-gpt4o-64k
13
  results: []
 
18
 
19
  # gemma2b-summarize-gpt4o-64k
20
 
21
+ This model is a fine-tuned version of [google/gemma-2b](https://huggingface.co/google/gemma-2b) on the generator dataset.
22
  It achieves the following results on the evaluation set:
23
+ - Loss: 2.6852
24
 
25
  ## Model description
26
 
 
40
 
41
  The following hyperparameters were used during training:
42
  - learning_rate: 0.0002
43
+ - train_batch_size: 16
44
+ - eval_batch_size: 16
45
  - seed: 42
46
  - distributed_type: multi-GPU
47
  - num_devices: 3
48
  - gradient_accumulation_steps: 2
49
+ - total_train_batch_size: 96
50
+ - total_eval_batch_size: 48
51
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
52
  - lr_scheduler_type: cosine
53
  - lr_scheduler_warmup_ratio: 0.1
54
+ - num_epochs: 10
55
 
56
  ### Training results
57
 
58
  | Training Loss | Epoch | Step | Validation Loss |
59
  |:-------------:|:-----:|:----:|:---------------:|
60
+ | 1.2474 | 1.0 | 146 | 2.5237 |
61
+ | 1.1269 | 2.0 | 292 | 2.4805 |
62
+ | 1.0909 | 3.0 | 438 | 2.4893 |
63
+ | 1.0354 | 4.0 | 584 | 2.5017 |
64
+ | 1.0016 | 5.0 | 730 | 2.5295 |
65
+ | 0.9823 | 6.0 | 876 | 2.5500 |
66
+ | 0.955 | 7.0 | 1022 | 2.5866 |
67
+ | 0.9214 | 8.0 | 1168 | 2.6224 |
68
+ | 0.913 | 9.0 | 1314 | 2.6512 |
69
+ | 0.889 | 10.0 | 1460 | 2.6852 |
 
 
 
 
 
70
 
71
 
72
  ### Framework versions
adapter_config.json CHANGED
@@ -20,13 +20,13 @@
20
  "rank_pattern": {},
21
  "revision": null,
22
  "target_modules": [
23
- "up_proj",
24
- "q_proj",
25
  "down_proj",
 
26
  "v_proj",
27
  "gate_proj",
28
- "o_proj",
29
- "k_proj"
 
30
  ],
31
  "task_type": "CAUSAL_LM",
32
  "use_dora": false,
 
20
  "rank_pattern": {},
21
  "revision": null,
22
  "target_modules": [
 
 
23
  "down_proj",
24
+ "o_proj",
25
  "v_proj",
26
  "gate_proj",
27
+ "k_proj",
28
+ "q_proj",
29
+ "up_proj"
30
  ],
31
  "task_type": "CAUSAL_LM",
32
  "use_dora": false,
all_results.json CHANGED
@@ -1,14 +1,9 @@
1
  {
2
- "epoch": 15.0,
3
- "eval_loss": 2.5990421772003174,
4
- "eval_runtime": 0.5314,
5
- "eval_samples": 25,
6
- "eval_samples_per_second": 18.818,
7
- "eval_steps_per_second": 1.882,
8
- "total_flos": 1.2863476116823736e+18,
9
- "train_loss": 1.080002195214572,
10
- "train_runtime": 11705.6736,
11
  "train_samples": 64610,
12
- "train_samples_per_second": 8.973,
13
- "train_steps_per_second": 0.187
14
  }
 
1
  {
2
+ "epoch": 10.273972602739725,
3
+ "total_flos": 8.853977907740017e+17,
4
+ "train_loss": 0.0,
5
+ "train_runtime": 2.8738,
 
 
 
 
 
6
  "train_samples": 64610,
7
+ "train_samples_per_second": 24365.255,
8
+ "train_steps_per_second": 508.044
9
  }
runs/Jun10_11-43-03_user-HP-Z8-Fury-G5-Workstation-Desktop-PC/events.out.tfevents.1717987400.user-HP-Z8-Fury-G5-Workstation-Desktop-PC.73754.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d74e4849807defff42ecab2ab67df523361891571af12ed8b8f60fc95c039f69
3
- size 71714
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d1586b8aacbaee33c4b5a35a3ebcad1ef834b46c289f437e3d88dc880170926c
3
+ size 72136
runs/Jun10_14-00-49_user-HP-Z8-Fury-G5-Workstation-Desktop-PC/events.out.tfevents.1717995664.user-HP-Z8-Fury-G5-Workstation-Desktop-PC.77116.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eca5cc643fd868cfbd2abee5fdc3b9696c593af3b2e02ff5bd0c6c2f8aeae47a
3
+ size 5953
train_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
- "epoch": 15.0,
3
- "total_flos": 1.2863476116823736e+18,
4
- "train_loss": 1.080002195214572,
5
- "train_runtime": 11705.6736,
6
  "train_samples": 64610,
7
- "train_samples_per_second": 8.973,
8
- "train_steps_per_second": 0.187
9
  }
 
1
  {
2
+ "epoch": 10.273972602739725,
3
+ "total_flos": 8.853977907740017e+17,
4
+ "train_loss": 0.0,
5
+ "train_runtime": 2.8738,
6
  "train_samples": 64610,
7
+ "train_samples_per_second": 24365.255,
8
+ "train_steps_per_second": 508.044
9
  }
trainer_state.json CHANGED
@@ -1,3220 +1,2214 @@
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
- "epoch": 15.0,
5
  "eval_steps": 500,
6
- "global_step": 2190,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
  "epoch": 0.00684931506849315,
13
- "grad_norm": 2.453125,
14
  "learning_rate": 9.132420091324201e-07,
15
  "loss": 3.0017,
16
  "step": 1
17
  },
18
  {
19
  "epoch": 0.03424657534246575,
20
- "grad_norm": 3.34375,
21
  "learning_rate": 4.566210045662101e-06,
22
- "loss": 3.0722,
23
  "step": 5
24
  },
25
  {
26
  "epoch": 0.0684931506849315,
27
- "grad_norm": 2.359375,
28
  "learning_rate": 9.132420091324201e-06,
29
- "loss": 3.0404,
30
  "step": 10
31
  },
32
  {
33
  "epoch": 0.10273972602739725,
34
- "grad_norm": 1.7890625,
35
  "learning_rate": 1.3698630136986302e-05,
36
- "loss": 3.0338,
37
  "step": 15
38
  },
39
  {
40
  "epoch": 0.136986301369863,
41
- "grad_norm": 2.421875,
42
  "learning_rate": 1.8264840182648402e-05,
43
- "loss": 3.023,
44
  "step": 20
45
  },
46
  {
47
  "epoch": 0.17123287671232876,
48
- "grad_norm": 14.125,
49
  "learning_rate": 2.2831050228310503e-05,
50
- "loss": 2.9441,
51
  "step": 25
52
  },
53
  {
54
  "epoch": 0.2054794520547945,
55
- "grad_norm": 2.609375,
56
  "learning_rate": 2.7397260273972603e-05,
57
- "loss": 2.8289,
58
  "step": 30
59
  },
60
  {
61
  "epoch": 0.23972602739726026,
62
- "grad_norm": 1.3203125,
63
  "learning_rate": 3.1963470319634704e-05,
64
- "loss": 2.6567,
65
  "step": 35
66
  },
67
  {
68
  "epoch": 0.273972602739726,
69
- "grad_norm": 1.5703125,
70
  "learning_rate": 3.6529680365296805e-05,
71
- "loss": 2.5383,
72
  "step": 40
73
  },
74
  {
75
  "epoch": 0.3082191780821918,
76
- "grad_norm": 1.8671875,
77
  "learning_rate": 4.1095890410958905e-05,
78
- "loss": 2.4484,
79
  "step": 45
80
  },
81
  {
82
  "epoch": 0.3424657534246575,
83
- "grad_norm": 1.140625,
84
  "learning_rate": 4.5662100456621006e-05,
85
- "loss": 2.3211,
86
  "step": 50
87
  },
88
  {
89
  "epoch": 0.3767123287671233,
90
- "grad_norm": 1.6640625,
91
  "learning_rate": 5.0228310502283106e-05,
92
- "loss": 2.1832,
93
  "step": 55
94
  },
95
  {
96
  "epoch": 0.410958904109589,
97
- "grad_norm": 82.0,
98
  "learning_rate": 5.479452054794521e-05,
99
- "loss": 2.0896,
100
  "step": 60
101
  },
102
  {
103
  "epoch": 0.4452054794520548,
104
- "grad_norm": 1.7890625,
105
  "learning_rate": 5.936073059360731e-05,
106
- "loss": 1.9847,
107
  "step": 65
108
  },
109
  {
110
  "epoch": 0.4794520547945205,
111
- "grad_norm": 0.61328125,
112
  "learning_rate": 6.392694063926941e-05,
113
- "loss": 1.8803,
114
  "step": 70
115
  },
116
  {
117
  "epoch": 0.5136986301369864,
118
- "grad_norm": 0.7890625,
119
  "learning_rate": 6.84931506849315e-05,
120
- "loss": 1.7981,
121
  "step": 75
122
  },
123
  {
124
  "epoch": 0.547945205479452,
125
- "grad_norm": 0.474609375,
126
  "learning_rate": 7.305936073059361e-05,
127
- "loss": 1.7173,
128
  "step": 80
129
  },
130
  {
131
  "epoch": 0.5821917808219178,
132
- "grad_norm": 0.4609375,
133
  "learning_rate": 7.76255707762557e-05,
134
- "loss": 1.6268,
135
  "step": 85
136
  },
137
  {
138
  "epoch": 0.6164383561643836,
139
- "grad_norm": 0.455078125,
140
  "learning_rate": 8.219178082191781e-05,
141
- "loss": 1.5857,
142
  "step": 90
143
  },
144
  {
145
  "epoch": 0.6506849315068494,
146
- "grad_norm": 0.298828125,
147
  "learning_rate": 8.67579908675799e-05,
148
- "loss": 1.5367,
149
  "step": 95
150
  },
151
  {
152
  "epoch": 0.684931506849315,
153
- "grad_norm": 0.37109375,
154
  "learning_rate": 9.132420091324201e-05,
155
- "loss": 1.4903,
156
  "step": 100
157
  },
158
  {
159
  "epoch": 0.7191780821917808,
160
- "grad_norm": 0.31640625,
161
  "learning_rate": 9.58904109589041e-05,
162
- "loss": 1.4631,
163
  "step": 105
164
  },
165
  {
166
  "epoch": 0.7534246575342466,
167
- "grad_norm": 0.30859375,
168
  "learning_rate": 0.00010045662100456621,
169
- "loss": 1.423,
170
  "step": 110
171
  },
172
  {
173
  "epoch": 0.7876712328767124,
174
- "grad_norm": 0.2490234375,
175
  "learning_rate": 0.00010502283105022832,
176
- "loss": 1.3912,
177
  "step": 115
178
  },
179
  {
180
  "epoch": 0.821917808219178,
181
- "grad_norm": 0.466796875,
182
  "learning_rate": 0.00010958904109589041,
183
- "loss": 1.3723,
184
  "step": 120
185
  },
186
  {
187
  "epoch": 0.8561643835616438,
188
- "grad_norm": 0.2890625,
189
  "learning_rate": 0.00011415525114155252,
190
- "loss": 1.3465,
191
  "step": 125
192
  },
193
  {
194
  "epoch": 0.8904109589041096,
195
- "grad_norm": 0.5625,
196
  "learning_rate": 0.00011872146118721462,
197
- "loss": 1.3393,
198
  "step": 130
199
  },
200
  {
201
  "epoch": 0.9246575342465754,
202
- "grad_norm": 0.271484375,
203
  "learning_rate": 0.0001232876712328767,
204
- "loss": 1.3149,
205
  "step": 135
206
  },
207
  {
208
  "epoch": 0.958904109589041,
209
- "grad_norm": 0.314453125,
210
  "learning_rate": 0.00012785388127853882,
211
- "loss": 1.3162,
212
  "step": 140
213
  },
214
  {
215
  "epoch": 0.9931506849315068,
216
- "grad_norm": 0.412109375,
217
  "learning_rate": 0.00013242009132420092,
218
- "loss": 1.2959,
219
  "step": 145
220
  },
221
  {
222
  "epoch": 1.0,
223
- "eval_loss": 2.529484510421753,
224
- "eval_runtime": 0.5429,
225
- "eval_samples_per_second": 18.42,
226
- "eval_steps_per_second": 1.842,
227
  "step": 146
228
  },
229
  {
230
  "epoch": 1.0273972602739727,
231
- "grad_norm": 0.57421875,
232
  "learning_rate": 0.000136986301369863,
233
- "loss": 1.2853,
234
  "step": 150
235
  },
236
  {
237
  "epoch": 1.0616438356164384,
238
- "grad_norm": 0.50390625,
239
  "learning_rate": 0.0001415525114155251,
240
- "loss": 1.2733,
241
  "step": 155
242
  },
243
  {
244
  "epoch": 1.095890410958904,
245
- "grad_norm": 0.41015625,
246
  "learning_rate": 0.00014611872146118722,
247
- "loss": 1.2646,
248
  "step": 160
249
  },
250
  {
251
  "epoch": 1.13013698630137,
252
- "grad_norm": 0.330078125,
253
  "learning_rate": 0.00015068493150684933,
254
- "loss": 1.2417,
255
  "step": 165
256
  },
257
  {
258
  "epoch": 1.1643835616438356,
259
- "grad_norm": 0.37890625,
260
  "learning_rate": 0.0001552511415525114,
261
- "loss": 1.2469,
262
  "step": 170
263
  },
264
  {
265
  "epoch": 1.1986301369863013,
266
- "grad_norm": 0.32421875,
267
  "learning_rate": 0.00015981735159817351,
268
- "loss": 1.2308,
269
  "step": 175
270
  },
271
  {
272
  "epoch": 1.2328767123287672,
273
- "grad_norm": 0.4375,
274
  "learning_rate": 0.00016438356164383562,
275
- "loss": 1.2253,
276
  "step": 180
277
  },
278
  {
279
  "epoch": 1.2671232876712328,
280
- "grad_norm": 0.60546875,
281
  "learning_rate": 0.00016894977168949773,
282
- "loss": 1.2334,
283
  "step": 185
284
  },
285
  {
286
  "epoch": 1.3013698630136985,
287
- "grad_norm": 0.734375,
288
  "learning_rate": 0.0001735159817351598,
289
- "loss": 1.2207,
290
  "step": 190
291
  },
292
  {
293
  "epoch": 1.3356164383561644,
294
- "grad_norm": 0.3671875,
295
  "learning_rate": 0.00017808219178082192,
296
- "loss": 1.212,
297
  "step": 195
298
  },
299
  {
300
  "epoch": 1.36986301369863,
301
- "grad_norm": 0.37890625,
302
  "learning_rate": 0.00018264840182648402,
303
- "loss": 1.2152,
304
  "step": 200
305
  },
306
  {
307
  "epoch": 1.404109589041096,
308
- "grad_norm": 0.3359375,
309
  "learning_rate": 0.00018721461187214613,
310
- "loss": 1.2071,
311
  "step": 205
312
  },
313
  {
314
  "epoch": 1.4383561643835616,
315
- "grad_norm": 0.361328125,
316
  "learning_rate": 0.0001917808219178082,
317
- "loss": 1.2011,
318
  "step": 210
319
  },
320
  {
321
  "epoch": 1.4726027397260273,
322
- "grad_norm": 0.345703125,
323
  "learning_rate": 0.00019634703196347032,
324
- "loss": 1.2017,
325
  "step": 215
326
  },
327
  {
328
  "epoch": 1.5068493150684932,
329
- "grad_norm": 0.6484375,
330
  "learning_rate": 0.00019999987297289245,
331
- "loss": 1.1886,
332
  "step": 220
333
  },
334
  {
335
  "epoch": 1.541095890410959,
336
- "grad_norm": 0.482421875,
337
  "learning_rate": 0.00019999542705801296,
338
- "loss": 1.1932,
339
  "step": 225
340
  },
341
  {
342
  "epoch": 1.5753424657534247,
343
- "grad_norm": 0.7578125,
344
  "learning_rate": 0.00019998463011046926,
345
- "loss": 1.1888,
346
  "step": 230
347
  },
348
  {
349
  "epoch": 1.6095890410958904,
350
- "grad_norm": 1.5234375,
351
  "learning_rate": 0.00019996748281601038,
352
- "loss": 1.1868,
353
  "step": 235
354
  },
355
  {
356
  "epoch": 1.643835616438356,
357
- "grad_norm": 1.1875,
358
  "learning_rate": 0.00019994398626371643,
359
- "loss": 1.1775,
360
  "step": 240
361
  },
362
  {
363
  "epoch": 1.678082191780822,
364
- "grad_norm": 0.796875,
365
  "learning_rate": 0.0001999141419459293,
366
- "loss": 1.192,
367
  "step": 245
368
  },
369
  {
370
  "epoch": 1.7123287671232876,
371
- "grad_norm": 0.9140625,
372
  "learning_rate": 0.00019987795175815807,
373
- "loss": 1.1756,
374
  "step": 250
375
  },
376
  {
377
  "epoch": 1.7465753424657535,
378
- "grad_norm": 0.337890625,
379
  "learning_rate": 0.0001998354179989585,
380
- "loss": 1.1747,
381
  "step": 255
382
  },
383
  {
384
  "epoch": 1.7808219178082192,
385
- "grad_norm": 0.546875,
386
  "learning_rate": 0.0001997865433697871,
387
- "loss": 1.1785,
388
  "step": 260
389
  },
390
  {
391
  "epoch": 1.8150684931506849,
392
- "grad_norm": 0.5625,
393
  "learning_rate": 0.00019973133097482947,
394
- "loss": 1.158,
395
  "step": 265
396
  },
397
  {
398
  "epoch": 1.8493150684931505,
399
- "grad_norm": 0.439453125,
400
  "learning_rate": 0.00019966978432080316,
401
- "loss": 1.1678,
402
  "step": 270
403
  },
404
  {
405
  "epoch": 1.8835616438356164,
406
- "grad_norm": 0.609375,
407
  "learning_rate": 0.00019960190731673505,
408
- "loss": 1.1646,
409
  "step": 275
410
  },
411
  {
412
  "epoch": 1.9178082191780823,
413
- "grad_norm": 0.3984375,
414
  "learning_rate": 0.00019952770427371304,
415
- "loss": 1.1514,
416
  "step": 280
417
  },
418
  {
419
  "epoch": 1.952054794520548,
420
- "grad_norm": 0.33984375,
421
  "learning_rate": 0.00019944717990461207,
422
- "loss": 1.1478,
423
  "step": 285
424
  },
425
  {
426
  "epoch": 1.9863013698630136,
427
- "grad_norm": 0.5703125,
428
  "learning_rate": 0.00019936033932379504,
429
- "loss": 1.1524,
430
  "step": 290
431
  },
432
  {
433
  "epoch": 2.0,
434
- "eval_loss": 2.491286277770996,
435
- "eval_runtime": 0.5472,
436
- "eval_samples_per_second": 18.275,
437
- "eval_steps_per_second": 1.827,
438
  "step": 292
439
  },
440
  {
441
  "epoch": 2.0205479452054793,
442
- "grad_norm": 0.419921875,
443
  "learning_rate": 0.00019926718804678785,
444
- "loss": 1.1515,
445
  "step": 295
446
  },
447
  {
448
  "epoch": 2.0547945205479454,
449
- "grad_norm": 0.384765625,
450
  "learning_rate": 0.000199167731989929,
451
- "loss": 1.1337,
452
  "step": 300
453
  },
454
  {
455
  "epoch": 2.089041095890411,
456
- "grad_norm": 0.34765625,
457
  "learning_rate": 0.00019906197746999408,
458
- "loss": 1.1328,
459
  "step": 305
460
  },
461
  {
462
  "epoch": 2.1232876712328768,
463
- "grad_norm": 0.36328125,
464
  "learning_rate": 0.00019894993120379435,
465
- "loss": 1.1231,
466
  "step": 310
467
  },
468
  {
469
  "epoch": 2.1575342465753424,
470
- "grad_norm": 0.427734375,
471
  "learning_rate": 0.00019883160030775016,
472
- "loss": 1.1349,
473
  "step": 315
474
  },
475
  {
476
  "epoch": 2.191780821917808,
477
- "grad_norm": 0.35546875,
478
  "learning_rate": 0.00019870699229743911,
479
- "loss": 1.1291,
480
  "step": 320
481
  },
482
  {
483
  "epoch": 2.2260273972602738,
484
- "grad_norm": 0.76171875,
485
  "learning_rate": 0.0001985761150871185,
486
- "loss": 1.1263,
487
  "step": 325
488
  },
489
  {
490
  "epoch": 2.26027397260274,
491
- "grad_norm": 0.71875,
492
  "learning_rate": 0.00019843897698922284,
493
- "loss": 1.1248,
494
  "step": 330
495
  },
496
  {
497
  "epoch": 2.2945205479452055,
498
- "grad_norm": 0.8046875,
499
  "learning_rate": 0.00019829558671383585,
500
- "loss": 1.1254,
501
  "step": 335
502
  },
503
  {
504
  "epoch": 2.328767123287671,
505
- "grad_norm": 0.439453125,
506
  "learning_rate": 0.00019814595336813725,
507
- "loss": 1.118,
508
  "step": 340
509
  },
510
  {
511
  "epoch": 2.363013698630137,
512
- "grad_norm": 0.53125,
513
  "learning_rate": 0.0001979900864558242,
514
- "loss": 1.1152,
515
  "step": 345
516
  },
517
  {
518
  "epoch": 2.3972602739726026,
519
- "grad_norm": 0.376953125,
520
  "learning_rate": 0.00019782799587650805,
521
- "loss": 1.133,
522
  "step": 350
523
  },
524
  {
525
  "epoch": 2.4315068493150687,
526
- "grad_norm": 0.66796875,
527
  "learning_rate": 0.00019765969192508508,
528
- "loss": 1.1186,
529
  "step": 355
530
  },
531
  {
532
  "epoch": 2.4657534246575343,
533
- "grad_norm": 0.671875,
534
  "learning_rate": 0.00019748518529108316,
535
- "loss": 1.1231,
536
  "step": 360
537
  },
538
  {
539
  "epoch": 2.5,
540
- "grad_norm": 0.42578125,
541
  "learning_rate": 0.00019730448705798239,
542
- "loss": 1.1229,
543
  "step": 365
544
  },
545
  {
546
  "epoch": 2.5342465753424657,
547
- "grad_norm": 0.361328125,
548
  "learning_rate": 0.00019711760870251143,
549
- "loss": 1.1169,
550
  "step": 370
551
  },
552
  {
553
  "epoch": 2.5684931506849313,
554
- "grad_norm": 0.41796875,
555
  "learning_rate": 0.00019692456209391846,
556
- "loss": 1.1096,
557
  "step": 375
558
  },
559
  {
560
  "epoch": 2.602739726027397,
561
- "grad_norm": 0.451171875,
562
  "learning_rate": 0.0001967253594932173,
563
- "loss": 1.1121,
564
  "step": 380
565
  },
566
  {
567
  "epoch": 2.636986301369863,
568
- "grad_norm": 0.390625,
569
  "learning_rate": 0.00019652001355240878,
570
- "loss": 1.1203,
571
  "step": 385
572
  },
573
  {
574
  "epoch": 2.671232876712329,
575
- "grad_norm": 0.37109375,
576
  "learning_rate": 0.00019630853731367713,
577
- "loss": 1.1165,
578
  "step": 390
579
  },
580
  {
581
  "epoch": 2.7054794520547945,
582
- "grad_norm": 0.4140625,
583
  "learning_rate": 0.0001960909442085615,
584
- "loss": 1.1158,
585
  "step": 395
586
  },
587
  {
588
  "epoch": 2.73972602739726,
589
- "grad_norm": 0.400390625,
590
  "learning_rate": 0.00019586724805710306,
591
- "loss": 1.1049,
592
  "step": 400
593
  },
594
  {
595
  "epoch": 2.7739726027397262,
596
- "grad_norm": 0.400390625,
597
  "learning_rate": 0.0001956374630669672,
598
- "loss": 1.1128,
599
  "step": 405
600
  },
601
  {
602
  "epoch": 2.808219178082192,
603
- "grad_norm": 0.478515625,
604
  "learning_rate": 0.00019540160383254107,
605
- "loss": 1.1057,
606
  "step": 410
607
  },
608
  {
609
  "epoch": 2.8424657534246576,
610
- "grad_norm": 0.462890625,
611
  "learning_rate": 0.00019515968533400673,
612
- "loss": 1.1149,
613
  "step": 415
614
  },
615
  {
616
  "epoch": 2.8767123287671232,
617
- "grad_norm": 0.39453125,
618
  "learning_rate": 0.00019491172293638968,
619
- "loss": 1.1132,
620
  "step": 420
621
  },
622
  {
623
  "epoch": 2.910958904109589,
624
- "grad_norm": 0.453125,
625
  "learning_rate": 0.00019465773238858298,
626
- "loss": 1.1059,
627
  "step": 425
628
  },
629
  {
630
  "epoch": 2.9452054794520546,
631
- "grad_norm": 0.43359375,
632
  "learning_rate": 0.00019439772982234697,
633
- "loss": 1.1041,
634
  "step": 430
635
  },
636
  {
637
  "epoch": 2.9794520547945207,
638
- "grad_norm": 0.451171875,
639
  "learning_rate": 0.00019413173175128473,
640
- "loss": 1.1138,
641
  "step": 435
642
  },
643
  {
644
  "epoch": 3.0,
645
- "eval_loss": 2.4846999645233154,
646
- "eval_runtime": 0.5494,
647
- "eval_samples_per_second": 18.201,
648
- "eval_steps_per_second": 1.82,
649
  "step": 438
650
  },
651
  {
652
  "epoch": 3.0136986301369864,
653
- "grad_norm": 0.546875,
654
  "learning_rate": 0.0001938597550697932,
655
- "loss": 1.0911,
656
  "step": 440
657
  },
658
  {
659
  "epoch": 3.047945205479452,
660
- "grad_norm": 0.453125,
661
  "learning_rate": 0.00019358181705199015,
662
- "loss": 1.0876,
663
  "step": 445
664
  },
665
  {
666
  "epoch": 3.0821917808219177,
667
- "grad_norm": 0.400390625,
668
  "learning_rate": 0.00019329793535061723,
669
- "loss": 1.0864,
670
  "step": 450
671
  },
672
  {
673
  "epoch": 3.1164383561643834,
674
- "grad_norm": 0.400390625,
675
  "learning_rate": 0.00019300812799591846,
676
- "loss": 1.0891,
677
  "step": 455
678
  },
679
  {
680
  "epoch": 3.1506849315068495,
681
- "grad_norm": 0.65234375,
682
  "learning_rate": 0.00019271241339449536,
683
- "loss": 1.0752,
684
  "step": 460
685
  },
686
  {
687
  "epoch": 3.184931506849315,
688
- "grad_norm": 0.671875,
689
  "learning_rate": 0.00019241081032813772,
690
- "loss": 1.0834,
691
  "step": 465
692
  },
693
  {
694
  "epoch": 3.219178082191781,
695
- "grad_norm": 0.3828125,
696
  "learning_rate": 0.00019210333795263075,
697
- "loss": 1.0742,
698
  "step": 470
699
  },
700
  {
701
  "epoch": 3.2534246575342465,
702
- "grad_norm": 1.1171875,
703
  "learning_rate": 0.00019179001579653853,
704
- "loss": 1.0925,
705
  "step": 475
706
  },
707
  {
708
  "epoch": 3.287671232876712,
709
- "grad_norm": 0.4765625,
710
  "learning_rate": 0.0001914708637599636,
711
- "loss": 1.0828,
712
  "step": 480
713
  },
714
  {
715
  "epoch": 3.3219178082191783,
716
- "grad_norm": 0.443359375,
717
  "learning_rate": 0.00019114590211328288,
718
- "loss": 1.08,
719
  "step": 485
720
  },
721
  {
722
  "epoch": 3.356164383561644,
723
- "grad_norm": 0.37109375,
724
  "learning_rate": 0.0001908151514958606,
725
- "loss": 1.0891,
726
  "step": 490
727
  },
728
  {
729
  "epoch": 3.3904109589041096,
730
- "grad_norm": 0.58984375,
731
  "learning_rate": 0.00019047863291473717,
732
- "loss": 1.0785,
733
  "step": 495
734
  },
735
  {
736
  "epoch": 3.4246575342465753,
737
- "grad_norm": 0.396484375,
738
  "learning_rate": 0.00019013636774329495,
739
- "loss": 1.0865,
740
  "step": 500
741
  },
742
  {
743
  "epoch": 3.458904109589041,
744
- "grad_norm": 0.671875,
745
  "learning_rate": 0.00018978837771990085,
746
- "loss": 1.0759,
747
  "step": 505
748
  },
749
  {
750
  "epoch": 3.493150684931507,
751
- "grad_norm": 0.396484375,
752
  "learning_rate": 0.0001894346849465257,
753
- "loss": 1.0795,
754
  "step": 510
755
  },
756
  {
757
  "epoch": 3.5273972602739727,
758
- "grad_norm": 0.484375,
759
  "learning_rate": 0.00018907531188734026,
760
- "loss": 1.0868,
761
  "step": 515
762
  },
763
  {
764
  "epoch": 3.5616438356164384,
765
- "grad_norm": 0.55078125,
766
  "learning_rate": 0.00018871028136728874,
767
- "loss": 1.0838,
768
  "step": 520
769
  },
770
  {
771
  "epoch": 3.595890410958904,
772
- "grad_norm": 0.40234375,
773
  "learning_rate": 0.00018833961657063885,
774
- "loss": 1.0748,
775
  "step": 525
776
  },
777
  {
778
  "epoch": 3.6301369863013697,
779
- "grad_norm": 0.59765625,
780
  "learning_rate": 0.0001879633410395095,
781
- "loss": 1.0801,
782
  "step": 530
783
  },
784
  {
785
  "epoch": 3.6643835616438354,
786
- "grad_norm": 0.365234375,
787
  "learning_rate": 0.00018758147867237548,
788
- "loss": 1.0866,
789
  "step": 535
790
  },
791
  {
792
  "epoch": 3.6986301369863015,
793
- "grad_norm": 0.498046875,
794
  "learning_rate": 0.00018719405372254948,
795
- "loss": 1.08,
796
  "step": 540
797
  },
798
  {
799
  "epoch": 3.732876712328767,
800
- "grad_norm": 0.408203125,
801
  "learning_rate": 0.00018680109079664188,
802
- "loss": 1.0704,
803
  "step": 545
804
  },
805
  {
806
  "epoch": 3.767123287671233,
807
- "grad_norm": 0.3671875,
808
  "learning_rate": 0.0001864026148529978,
809
- "loss": 1.0698,
810
  "step": 550
811
  },
812
  {
813
  "epoch": 3.8013698630136985,
814
- "grad_norm": 0.4296875,
815
  "learning_rate": 0.00018599865120011192,
816
- "loss": 1.0801,
817
  "step": 555
818
  },
819
  {
820
  "epoch": 3.8356164383561646,
821
- "grad_norm": 0.3515625,
822
  "learning_rate": 0.00018558922549502107,
823
- "loss": 1.0605,
824
  "step": 560
825
  },
826
  {
827
  "epoch": 3.8698630136986303,
828
- "grad_norm": 0.443359375,
829
  "learning_rate": 0.0001851743637416747,
830
- "loss": 1.0766,
831
  "step": 565
832
  },
833
  {
834
  "epoch": 3.904109589041096,
835
- "grad_norm": 0.345703125,
836
  "learning_rate": 0.00018475409228928312,
837
- "loss": 1.0593,
838
  "step": 570
839
  },
840
  {
841
  "epoch": 3.9383561643835616,
842
- "grad_norm": 0.515625,
843
  "learning_rate": 0.00018432843783064429,
844
- "loss": 1.0751,
845
  "step": 575
846
  },
847
  {
848
  "epoch": 3.9726027397260273,
849
- "grad_norm": 0.458984375,
850
  "learning_rate": 0.00018389742740044813,
851
- "loss": 1.0703,
852
  "step": 580
853
  },
854
  {
855
  "epoch": 4.0,
856
- "eval_loss": 2.492682933807373,
857
- "eval_runtime": 0.5379,
858
- "eval_samples_per_second": 18.592,
859
- "eval_steps_per_second": 1.859,
860
  "step": 584
861
  },
862
  {
863
  "epoch": 4.006849315068493,
864
- "grad_norm": 0.6484375,
865
  "learning_rate": 0.00018346108837355972,
866
- "loss": 1.0779,
867
  "step": 585
868
  },
869
  {
870
  "epoch": 4.041095890410959,
871
- "grad_norm": 0.4140625,
872
  "learning_rate": 0.00018301944846328049,
873
- "loss": 1.0393,
874
  "step": 590
875
  },
876
  {
877
  "epoch": 4.075342465753424,
878
- "grad_norm": 0.390625,
879
  "learning_rate": 0.0001825725357195881,
880
- "loss": 1.0573,
881
  "step": 595
882
  },
883
  {
884
  "epoch": 4.109589041095891,
885
- "grad_norm": 0.412109375,
886
  "learning_rate": 0.00018212037852735486,
887
- "loss": 1.0498,
888
  "step": 600
889
  },
890
  {
891
  "epoch": 4.1438356164383565,
892
- "grad_norm": 0.353515625,
893
  "learning_rate": 0.0001816630056045451,
894
- "loss": 1.0507,
895
  "step": 605
896
  },
897
  {
898
  "epoch": 4.178082191780822,
899
- "grad_norm": 0.384765625,
900
  "learning_rate": 0.0001812004460003909,
901
- "loss": 1.0398,
902
  "step": 610
903
  },
904
  {
905
  "epoch": 4.212328767123288,
906
- "grad_norm": 0.45703125,
907
  "learning_rate": 0.00018073272909354727,
908
- "loss": 1.0603,
909
  "step": 615
910
  },
911
  {
912
  "epoch": 4.2465753424657535,
913
- "grad_norm": 0.5703125,
914
  "learning_rate": 0.0001802598845902262,
915
- "loss": 1.0383,
916
  "step": 620
917
  },
918
  {
919
  "epoch": 4.280821917808219,
920
- "grad_norm": 0.40625,
921
  "learning_rate": 0.00017978194252230985,
922
- "loss": 1.0511,
923
  "step": 625
924
  },
925
  {
926
  "epoch": 4.315068493150685,
927
- "grad_norm": 0.39453125,
928
  "learning_rate": 0.00017929893324544332,
929
- "loss": 1.0423,
930
  "step": 630
931
  },
932
  {
933
  "epoch": 4.3493150684931505,
934
- "grad_norm": 0.65234375,
935
  "learning_rate": 0.0001788108874371063,
936
- "loss": 1.0549,
937
  "step": 635
938
  },
939
  {
940
  "epoch": 4.383561643835616,
941
- "grad_norm": 0.388671875,
942
  "learning_rate": 0.00017831783609466504,
943
- "loss": 1.0481,
944
  "step": 640
945
  },
946
  {
947
  "epoch": 4.417808219178082,
948
- "grad_norm": 0.373046875,
949
  "learning_rate": 0.00017781981053340337,
950
- "loss": 1.0571,
951
  "step": 645
952
  },
953
  {
954
  "epoch": 4.4520547945205475,
955
- "grad_norm": 0.384765625,
956
  "learning_rate": 0.00017731684238453385,
957
- "loss": 1.0452,
958
  "step": 650
959
  },
960
  {
961
  "epoch": 4.486301369863014,
962
- "grad_norm": 0.427734375,
963
  "learning_rate": 0.0001768089635931887,
964
- "loss": 1.0548,
965
  "step": 655
966
  },
967
  {
968
  "epoch": 4.52054794520548,
969
- "grad_norm": 0.671875,
970
  "learning_rate": 0.00017629620641639103,
971
- "loss": 1.0489,
972
  "step": 660
973
  },
974
  {
975
  "epoch": 4.554794520547945,
976
- "grad_norm": 0.478515625,
977
  "learning_rate": 0.00017577860342100579,
978
- "loss": 1.0554,
979
  "step": 665
980
  },
981
  {
982
  "epoch": 4.589041095890411,
983
- "grad_norm": 0.74609375,
984
  "learning_rate": 0.0001752561874816717,
985
- "loss": 1.0573,
986
  "step": 670
987
  },
988
  {
989
  "epoch": 4.623287671232877,
990
- "grad_norm": 0.63671875,
991
  "learning_rate": 0.00017472899177871297,
992
- "loss": 1.0491,
993
  "step": 675
994
  },
995
  {
996
  "epoch": 4.657534246575342,
997
- "grad_norm": 0.365234375,
998
  "learning_rate": 0.00017419704979603214,
999
- "loss": 1.0613,
1000
  "step": 680
1001
  },
1002
  {
1003
  "epoch": 4.691780821917808,
1004
- "grad_norm": 0.431640625,
1005
  "learning_rate": 0.00017366039531898326,
1006
- "loss": 1.0561,
1007
  "step": 685
1008
  },
1009
  {
1010
  "epoch": 4.726027397260274,
1011
- "grad_norm": 0.373046875,
1012
  "learning_rate": 0.00017311906243222614,
1013
- "loss": 1.0588,
1014
  "step": 690
1015
  },
1016
  {
1017
  "epoch": 4.760273972602739,
1018
- "grad_norm": 0.361328125,
1019
  "learning_rate": 0.0001725730855175615,
1020
- "loss": 1.061,
1021
  "step": 695
1022
  },
1023
  {
1024
  "epoch": 4.794520547945205,
1025
- "grad_norm": 0.419921875,
1026
  "learning_rate": 0.00017202249925174723,
1027
- "loss": 1.0454,
1028
  "step": 700
1029
  },
1030
  {
1031
  "epoch": 4.828767123287671,
1032
- "grad_norm": 0.484375,
1033
  "learning_rate": 0.00017146733860429612,
1034
- "loss": 1.0584,
1035
  "step": 705
1036
  },
1037
  {
1038
  "epoch": 4.863013698630137,
1039
- "grad_norm": 0.50390625,
1040
  "learning_rate": 0.0001709076388352546,
1041
- "loss": 1.0478,
1042
  "step": 710
1043
  },
1044
  {
1045
  "epoch": 4.897260273972603,
1046
- "grad_norm": 0.337890625,
1047
  "learning_rate": 0.00017034343549296346,
1048
- "loss": 1.068,
1049
  "step": 715
1050
  },
1051
  {
1052
  "epoch": 4.931506849315069,
1053
- "grad_norm": 0.5234375,
1054
  "learning_rate": 0.00016977476441179992,
1055
- "loss": 1.0435,
1056
  "step": 720
1057
  },
1058
  {
1059
  "epoch": 4.965753424657534,
1060
- "grad_norm": 0.50390625,
1061
  "learning_rate": 0.0001692016617099018,
1062
- "loss": 1.0474,
1063
  "step": 725
1064
  },
1065
  {
1066
  "epoch": 5.0,
1067
- "grad_norm": 0.453125,
1068
  "learning_rate": 0.0001686241637868734,
1069
- "loss": 1.0423,
1070
  "step": 730
1071
  },
1072
  {
1073
  "epoch": 5.0,
1074
- "eval_loss": 2.5079874992370605,
1075
- "eval_runtime": 0.5486,
1076
- "eval_samples_per_second": 18.227,
1077
- "eval_steps_per_second": 1.823,
1078
  "step": 730
1079
  },
1080
  {
1081
  "epoch": 5.034246575342466,
1082
- "grad_norm": 0.38671875,
1083
  "learning_rate": 0.0001680423073214737,
1084
- "loss": 1.035,
1085
  "step": 735
1086
  },
1087
  {
1088
  "epoch": 5.068493150684931,
1089
- "grad_norm": 0.376953125,
1090
  "learning_rate": 0.00016745612926928694,
1091
- "loss": 1.0372,
1092
  "step": 740
1093
  },
1094
  {
1095
  "epoch": 5.102739726027397,
1096
- "grad_norm": 0.43359375,
1097
  "learning_rate": 0.0001668656668603751,
1098
- "loss": 1.0247,
1099
  "step": 745
1100
  },
1101
  {
1102
  "epoch": 5.136986301369863,
1103
- "grad_norm": 0.36328125,
1104
  "learning_rate": 0.00016627095759691362,
1105
- "loss": 1.0201,
1106
  "step": 750
1107
  },
1108
  {
1109
  "epoch": 5.171232876712328,
1110
- "grad_norm": 0.44140625,
1111
  "learning_rate": 0.0001656720392508094,
1112
- "loss": 1.0256,
1113
  "step": 755
1114
  },
1115
  {
1116
  "epoch": 5.205479452054795,
1117
  "grad_norm": 0.376953125,
1118
  "learning_rate": 0.00016506894986130171,
1119
- "loss": 1.024,
1120
  "step": 760
1121
  },
1122
  {
1123
  "epoch": 5.239726027397261,
1124
- "grad_norm": 0.484375,
1125
  "learning_rate": 0.00016446172773254629,
1126
- "loss": 1.0239,
1127
  "step": 765
1128
  },
1129
  {
1130
  "epoch": 5.273972602739726,
1131
- "grad_norm": 0.6015625,
1132
  "learning_rate": 0.00016385041143118255,
1133
- "loss": 1.0331,
1134
  "step": 770
1135
  },
1136
  {
1137
  "epoch": 5.308219178082192,
1138
- "grad_norm": 0.515625,
1139
  "learning_rate": 0.000163235039783884,
1140
- "loss": 1.0368,
1141
  "step": 775
1142
  },
1143
  {
1144
  "epoch": 5.342465753424658,
1145
- "grad_norm": 0.5859375,
1146
  "learning_rate": 0.0001626156518748922,
1147
- "loss": 1.0276,
1148
  "step": 780
1149
  },
1150
  {
1151
  "epoch": 5.376712328767123,
1152
- "grad_norm": 0.546875,
1153
  "learning_rate": 0.00016199228704353455,
1154
- "loss": 1.0396,
1155
  "step": 785
1156
  },
1157
  {
1158
  "epoch": 5.410958904109589,
1159
- "grad_norm": 0.98046875,
1160
  "learning_rate": 0.00016136498488172568,
1161
- "loss": 1.0279,
1162
  "step": 790
1163
  },
1164
  {
1165
  "epoch": 5.445205479452055,
1166
- "grad_norm": 0.392578125,
1167
  "learning_rate": 0.0001607337852314527,
1168
- "loss": 1.0378,
1169
  "step": 795
1170
  },
1171
  {
1172
  "epoch": 5.47945205479452,
1173
- "grad_norm": 0.400390625,
1174
  "learning_rate": 0.00016009872818224485,
1175
- "loss": 1.0398,
1176
  "step": 800
1177
  },
1178
  {
1179
  "epoch": 5.513698630136986,
1180
- "grad_norm": 0.404296875,
1181
  "learning_rate": 0.00015945985406862721,
1182
- "loss": 1.031,
1183
  "step": 805
1184
  },
1185
  {
1186
  "epoch": 5.5479452054794525,
1187
- "grad_norm": 0.431640625,
1188
  "learning_rate": 0.00015881720346755905,
1189
- "loss": 1.0248,
1190
  "step": 810
1191
  },
1192
  {
1193
  "epoch": 5.582191780821918,
1194
- "grad_norm": 0.40234375,
1195
  "learning_rate": 0.00015817081719585643,
1196
- "loss": 1.023,
1197
  "step": 815
1198
  },
1199
  {
1200
  "epoch": 5.616438356164384,
1201
- "grad_norm": 0.44140625,
1202
  "learning_rate": 0.00015752073630759998,
1203
- "loss": 1.0427,
1204
  "step": 820
1205
  },
1206
  {
1207
  "epoch": 5.6506849315068495,
1208
- "grad_norm": 0.431640625,
1209
  "learning_rate": 0.00015686700209152738,
1210
- "loss": 1.0256,
1211
  "step": 825
1212
  },
1213
  {
1214
  "epoch": 5.684931506849315,
1215
- "grad_norm": 0.359375,
1216
  "learning_rate": 0.00015620965606841098,
1217
- "loss": 1.0225,
1218
  "step": 830
1219
  },
1220
  {
1221
  "epoch": 5.719178082191781,
1222
- "grad_norm": 0.40625,
1223
  "learning_rate": 0.0001555487399884206,
1224
- "loss": 1.0237,
1225
  "step": 835
1226
  },
1227
  {
1228
  "epoch": 5.7534246575342465,
1229
- "grad_norm": 0.455078125,
1230
  "learning_rate": 0.00015488429582847192,
1231
- "loss": 1.02,
1232
  "step": 840
1233
  },
1234
  {
1235
  "epoch": 5.787671232876712,
1236
- "grad_norm": 0.408203125,
1237
  "learning_rate": 0.0001542163657895605,
1238
- "loss": 1.0223,
1239
  "step": 845
1240
  },
1241
  {
1242
  "epoch": 5.821917808219178,
1243
- "grad_norm": 0.51171875,
1244
  "learning_rate": 0.00015354499229408114,
1245
- "loss": 1.0246,
1246
  "step": 850
1247
  },
1248
  {
1249
  "epoch": 5.8561643835616435,
1250
- "grad_norm": 0.54296875,
1251
  "learning_rate": 0.0001528702179831338,
1252
- "loss": 1.0216,
1253
  "step": 855
1254
  },
1255
  {
1256
  "epoch": 5.890410958904109,
1257
- "grad_norm": 0.546875,
1258
  "learning_rate": 0.00015219208571381525,
1259
- "loss": 1.028,
1260
  "step": 860
1261
  },
1262
  {
1263
  "epoch": 5.924657534246576,
1264
- "grad_norm": 0.46875,
1265
  "learning_rate": 0.00015151063855649698,
1266
- "loss": 1.0404,
1267
  "step": 865
1268
  },
1269
  {
1270
  "epoch": 5.958904109589041,
1271
- "grad_norm": 0.369140625,
1272
  "learning_rate": 0.00015082591979208976,
1273
- "loss": 1.0311,
1274
  "step": 870
1275
  },
1276
  {
1277
  "epoch": 5.993150684931507,
1278
- "grad_norm": 0.63671875,
1279
  "learning_rate": 0.00015013797290929466,
1280
- "loss": 1.0322,
1281
  "step": 875
1282
  },
1283
  {
1284
  "epoch": 6.0,
1285
- "eval_loss": 2.5202295780181885,
1286
- "eval_runtime": 0.5425,
1287
- "eval_samples_per_second": 18.433,
1288
- "eval_steps_per_second": 1.843,
1289
  "step": 876
1290
  },
1291
  {
1292
  "epoch": 6.027397260273973,
1293
- "grad_norm": 0.515625,
1294
  "learning_rate": 0.00014944684160184108,
1295
- "loss": 1.0176,
1296
  "step": 880
1297
  },
1298
  {
1299
  "epoch": 6.061643835616438,
1300
- "grad_norm": 0.412109375,
1301
  "learning_rate": 0.00014875256976571135,
1302
- "loss": 1.0053,
1303
  "step": 885
1304
  },
1305
  {
1306
  "epoch": 6.095890410958904,
1307
- "grad_norm": 0.3984375,
1308
  "learning_rate": 0.00014805520149635307,
1309
- "loss": 0.9952,
1310
  "step": 890
1311
  },
1312
  {
1313
  "epoch": 6.13013698630137,
1314
- "grad_norm": 0.3828125,
1315
  "learning_rate": 0.00014735478108587828,
1316
- "loss": 1.0034,
1317
  "step": 895
1318
  },
1319
  {
1320
  "epoch": 6.164383561643835,
1321
- "grad_norm": 0.431640625,
1322
  "learning_rate": 0.00014665135302025035,
1323
- "loss": 1.0069,
1324
  "step": 900
1325
  },
1326
  {
1327
  "epoch": 6.198630136986301,
1328
- "grad_norm": 0.404296875,
1329
  "learning_rate": 0.00014594496197645852,
1330
- "loss": 1.0014,
1331
  "step": 905
1332
  },
1333
  {
1334
  "epoch": 6.232876712328767,
1335
- "grad_norm": 0.392578125,
1336
  "learning_rate": 0.0001452356528196804,
1337
- "loss": 1.0081,
1338
  "step": 910
1339
  },
1340
  {
1341
  "epoch": 6.267123287671233,
1342
- "grad_norm": 0.37890625,
1343
  "learning_rate": 0.00014452347060043237,
1344
- "loss": 1.0132,
1345
  "step": 915
1346
  },
1347
  {
1348
  "epoch": 6.301369863013699,
1349
- "grad_norm": 0.4375,
1350
  "learning_rate": 0.00014380846055170828,
1351
- "loss": 1.0091,
1352
  "step": 920
1353
  },
1354
  {
1355
  "epoch": 6.335616438356165,
1356
- "grad_norm": 0.5546875,
1357
  "learning_rate": 0.00014309066808610655,
1358
- "loss": 1.0125,
1359
  "step": 925
1360
  },
1361
  {
1362
  "epoch": 6.36986301369863,
1363
- "grad_norm": 0.498046875,
1364
  "learning_rate": 0.0001423701387929459,
1365
- "loss": 1.012,
1366
  "step": 930
1367
  },
1368
  {
1369
  "epoch": 6.404109589041096,
1370
- "grad_norm": 0.390625,
1371
  "learning_rate": 0.00014164691843536982,
1372
- "loss": 1.0092,
1373
  "step": 935
1374
  },
1375
  {
1376
  "epoch": 6.438356164383562,
1377
- "grad_norm": 0.42578125,
1378
  "learning_rate": 0.00014092105294744,
1379
- "loss": 1.0142,
1380
  "step": 940
1381
  },
1382
  {
1383
  "epoch": 6.472602739726027,
1384
- "grad_norm": 0.423828125,
1385
  "learning_rate": 0.00014019258843121893,
1386
- "loss": 1.0131,
1387
  "step": 945
1388
  },
1389
  {
1390
  "epoch": 6.506849315068493,
1391
- "grad_norm": 0.419921875,
1392
  "learning_rate": 0.0001394615711538417,
1393
- "loss": 1.0083,
1394
  "step": 950
1395
  },
1396
  {
1397
  "epoch": 6.541095890410959,
1398
- "grad_norm": 0.40625,
1399
  "learning_rate": 0.00013872804754457759,
1400
- "loss": 1.0118,
1401
  "step": 955
1402
  },
1403
  {
1404
  "epoch": 6.575342465753424,
1405
- "grad_norm": 0.408203125,
1406
  "learning_rate": 0.00013799206419188103,
1407
- "loss": 1.0183,
1408
  "step": 960
1409
  },
1410
  {
1411
  "epoch": 6.609589041095891,
1412
- "grad_norm": 0.40234375,
1413
  "learning_rate": 0.00013725366784043288,
1414
- "loss": 1.0103,
1415
  "step": 965
1416
  },
1417
  {
1418
  "epoch": 6.6438356164383565,
1419
- "grad_norm": 0.40234375,
1420
  "learning_rate": 0.00013651290538817113,
1421
- "loss": 1.0134,
1422
  "step": 970
1423
  },
1424
  {
1425
  "epoch": 6.678082191780822,
1426
- "grad_norm": 0.423828125,
1427
  "learning_rate": 0.0001357698238833126,
1428
- "loss": 1.0199,
1429
  "step": 975
1430
  },
1431
  {
1432
  "epoch": 6.712328767123288,
1433
- "grad_norm": 0.3984375,
1434
  "learning_rate": 0.00013502447052136455,
1435
- "loss": 1.0042,
1436
  "step": 980
1437
  },
1438
  {
1439
  "epoch": 6.7465753424657535,
1440
- "grad_norm": 0.38671875,
1441
  "learning_rate": 0.00013427689264212738,
1442
- "loss": 1.0171,
1443
  "step": 985
1444
  },
1445
  {
1446
  "epoch": 6.780821917808219,
1447
- "grad_norm": 0.423828125,
1448
  "learning_rate": 0.00013352713772668765,
1449
- "loss": 1.0065,
1450
  "step": 990
1451
  },
1452
  {
1453
  "epoch": 6.815068493150685,
1454
- "grad_norm": 0.408203125,
1455
  "learning_rate": 0.0001327752533944025,
1456
- "loss": 1.01,
1457
  "step": 995
1458
  },
1459
  {
1460
  "epoch": 6.8493150684931505,
1461
- "grad_norm": 0.515625,
1462
  "learning_rate": 0.00013202128739987532,
1463
- "loss": 1.0133,
1464
  "step": 1000
1465
  },
1466
  {
1467
  "epoch": 6.883561643835616,
1468
- "grad_norm": 0.447265625,
1469
  "learning_rate": 0.00013126528762992247,
1470
- "loss": 1.0171,
1471
  "step": 1005
1472
  },
1473
  {
1474
  "epoch": 6.917808219178082,
1475
- "grad_norm": 0.39453125,
1476
  "learning_rate": 0.0001305073021005321,
1477
- "loss": 1.011,
1478
  "step": 1010
1479
  },
1480
  {
1481
  "epoch": 6.9520547945205475,
1482
- "grad_norm": 0.451171875,
1483
  "learning_rate": 0.0001297473789538142,
1484
- "loss": 1.0125,
1485
  "step": 1015
1486
  },
1487
  {
1488
  "epoch": 6.986301369863014,
1489
- "grad_norm": 0.3984375,
1490
  "learning_rate": 0.00012898556645494325,
1491
- "loss": 1.0113,
1492
  "step": 1020
1493
  },
1494
  {
1495
  "epoch": 7.0,
1496
- "eval_loss": 2.538480758666992,
1497
- "eval_runtime": 0.546,
1498
- "eval_samples_per_second": 18.313,
1499
- "eval_steps_per_second": 1.831,
1500
  "step": 1022
1501
  },
1502
  {
1503
  "epoch": 7.02054794520548,
1504
- "grad_norm": 0.400390625,
1505
  "learning_rate": 0.0001282219129890925,
1506
- "loss": 0.9999,
1507
  "step": 1025
1508
  },
1509
  {
1510
  "epoch": 7.054794520547945,
1511
- "grad_norm": 0.40234375,
1512
  "learning_rate": 0.00012745646705836097,
1513
- "loss": 0.9939,
1514
  "step": 1030
1515
  },
1516
  {
1517
  "epoch": 7.089041095890411,
1518
- "grad_norm": 0.54296875,
1519
  "learning_rate": 0.0001266892772786929,
1520
- "loss": 0.9812,
1521
  "step": 1035
1522
  },
1523
  {
1524
  "epoch": 7.123287671232877,
1525
- "grad_norm": 0.3984375,
1526
  "learning_rate": 0.0001259203923767901,
1527
- "loss": 0.9884,
1528
  "step": 1040
1529
  },
1530
  {
1531
  "epoch": 7.157534246575342,
1532
- "grad_norm": 0.38671875,
1533
  "learning_rate": 0.00012514986118701695,
1534
- "loss": 0.988,
1535
  "step": 1045
1536
  },
1537
  {
1538
  "epoch": 7.191780821917808,
1539
- "grad_norm": 0.416015625,
1540
  "learning_rate": 0.00012437773264829897,
1541
- "loss": 0.9939,
1542
  "step": 1050
1543
  },
1544
  {
1545
  "epoch": 7.226027397260274,
1546
- "grad_norm": 0.416015625,
1547
  "learning_rate": 0.00012360405580101448,
1548
- "loss": 0.9979,
1549
  "step": 1055
1550
  },
1551
  {
1552
  "epoch": 7.260273972602739,
1553
- "grad_norm": 0.443359375,
1554
  "learning_rate": 0.00012282887978387976,
1555
- "loss": 1.0047,
1556
  "step": 1060
1557
  },
1558
  {
1559
  "epoch": 7.294520547945205,
1560
- "grad_norm": 0.458984375,
1561
  "learning_rate": 0.00012205225383082843,
1562
- "loss": 0.9971,
1563
  "step": 1065
1564
  },
1565
  {
1566
  "epoch": 7.328767123287671,
1567
- "grad_norm": 0.439453125,
1568
  "learning_rate": 0.000121274227267884,
1569
- "loss": 0.9918,
1570
  "step": 1070
1571
  },
1572
  {
1573
  "epoch": 7.363013698630137,
1574
- "grad_norm": 0.421875,
1575
  "learning_rate": 0.00012049484951002739,
1576
- "loss": 1.0011,
1577
  "step": 1075
1578
  },
1579
  {
1580
  "epoch": 7.397260273972603,
1581
- "grad_norm": 0.416015625,
1582
  "learning_rate": 0.00011971417005805818,
1583
- "loss": 0.9909,
1584
  "step": 1080
1585
  },
1586
  {
1587
  "epoch": 7.431506849315069,
1588
- "grad_norm": 0.376953125,
1589
  "learning_rate": 0.00011893223849545084,
1590
- "loss": 0.9974,
1591
  "step": 1085
1592
  },
1593
  {
1594
  "epoch": 7.465753424657534,
1595
- "grad_norm": 0.408203125,
1596
  "learning_rate": 0.00011814910448520536,
1597
- "loss": 0.9934,
1598
  "step": 1090
1599
  },
1600
  {
1601
  "epoch": 7.5,
1602
  "grad_norm": 0.404296875,
1603
  "learning_rate": 0.00011736481776669306,
1604
- "loss": 0.9965,
1605
  "step": 1095
1606
  },
1607
  {
1608
  "epoch": 7.534246575342466,
1609
- "grad_norm": 0.396484375,
1610
  "learning_rate": 0.00011657942815249754,
1611
- "loss": 0.9936,
1612
  "step": 1100
1613
  },
1614
  {
1615
  "epoch": 7.568493150684931,
1616
- "grad_norm": 0.38671875,
1617
  "learning_rate": 0.00011579298552525084,
1618
- "loss": 0.9919,
1619
  "step": 1105
1620
  },
1621
  {
1622
  "epoch": 7.602739726027397,
1623
- "grad_norm": 0.408203125,
1624
  "learning_rate": 0.00011500553983446527,
1625
- "loss": 0.9956,
1626
  "step": 1110
1627
  },
1628
  {
1629
  "epoch": 7.636986301369863,
1630
- "grad_norm": 0.39453125,
1631
  "learning_rate": 0.00011421714109336097,
1632
- "loss": 0.9989,
1633
  "step": 1115
1634
  },
1635
  {
1636
  "epoch": 7.671232876712329,
1637
- "grad_norm": 0.46875,
1638
  "learning_rate": 0.00011342783937568926,
1639
- "loss": 1.0018,
1640
  "step": 1120
1641
  },
1642
  {
1643
  "epoch": 7.705479452054795,
1644
- "grad_norm": 0.46484375,
1645
  "learning_rate": 0.00011263768481255264,
1646
- "loss": 0.993,
1647
  "step": 1125
1648
  },
1649
  {
1650
  "epoch": 7.739726027397261,
1651
- "grad_norm": 0.419921875,
1652
  "learning_rate": 0.00011184672758922034,
1653
- "loss": 1.0063,
1654
  "step": 1130
1655
  },
1656
  {
1657
  "epoch": 7.773972602739726,
1658
- "grad_norm": 0.44140625,
1659
  "learning_rate": 0.00011105501794194131,
1660
- "loss": 0.9936,
1661
  "step": 1135
1662
  },
1663
  {
1664
  "epoch": 7.808219178082192,
1665
- "grad_norm": 0.427734375,
1666
  "learning_rate": 0.00011026260615475333,
1667
- "loss": 1.0078,
1668
  "step": 1140
1669
  },
1670
  {
1671
  "epoch": 7.842465753424658,
1672
- "grad_norm": 0.38671875,
1673
  "learning_rate": 0.00010946954255628928,
1674
- "loss": 0.9997,
1675
  "step": 1145
1676
  },
1677
  {
1678
  "epoch": 7.876712328767123,
1679
- "grad_norm": 0.478515625,
1680
  "learning_rate": 0.00010867587751658079,
1681
- "loss": 0.9902,
1682
  "step": 1150
1683
  },
1684
  {
1685
  "epoch": 7.910958904109589,
1686
- "grad_norm": 0.38671875,
1687
  "learning_rate": 0.00010788166144385888,
1688
- "loss": 0.9905,
1689
  "step": 1155
1690
  },
1691
  {
1692
  "epoch": 7.945205479452055,
1693
- "grad_norm": 0.431640625,
1694
  "learning_rate": 0.0001070869447813525,
1695
- "loss": 0.9834,
1696
  "step": 1160
1697
  },
1698
  {
1699
  "epoch": 7.97945205479452,
1700
- "grad_norm": 0.392578125,
1701
  "learning_rate": 0.0001062917780040847,
1702
- "loss": 0.9857,
1703
  "step": 1165
1704
  },
1705
  {
1706
  "epoch": 8.0,
1707
- "eval_loss": 2.55222225189209,
1708
- "eval_runtime": 0.5458,
1709
- "eval_samples_per_second": 18.321,
1710
- "eval_steps_per_second": 1.832,
1711
  "step": 1168
1712
  },
1713
  {
1714
  "epoch": 8.013698630136986,
1715
- "grad_norm": 0.412109375,
1716
  "learning_rate": 0.0001054962116156667,
1717
- "loss": 0.9837,
1718
  "step": 1170
1719
  },
1720
  {
1721
  "epoch": 8.047945205479452,
1722
  "grad_norm": 0.41796875,
1723
  "learning_rate": 0.00010470029614509041,
1724
- "loss": 0.9743,
1725
  "step": 1175
1726
  },
1727
  {
1728
  "epoch": 8.082191780821917,
1729
- "grad_norm": 0.423828125,
1730
  "learning_rate": 0.00010390408214351892,
1731
- "loss": 0.9731,
1732
  "step": 1180
1733
  },
1734
  {
1735
  "epoch": 8.116438356164384,
1736
- "grad_norm": 0.427734375,
1737
  "learning_rate": 0.0001031076201810762,
1738
- "loss": 0.9772,
1739
  "step": 1185
1740
  },
1741
  {
1742
  "epoch": 8.150684931506849,
1743
- "grad_norm": 0.4296875,
1744
  "learning_rate": 0.00010231096084363483,
1745
- "loss": 0.9671,
1746
  "step": 1190
1747
  },
1748
  {
1749
  "epoch": 8.184931506849315,
1750
- "grad_norm": 0.40234375,
1751
  "learning_rate": 0.00010151415472960342,
1752
- "loss": 0.9934,
1753
  "step": 1195
1754
  },
1755
  {
1756
  "epoch": 8.219178082191782,
1757
- "grad_norm": 0.423828125,
1758
  "learning_rate": 0.00010071725244671282,
1759
- "loss": 0.9819,
1760
  "step": 1200
1761
  },
1762
  {
1763
  "epoch": 8.253424657534246,
1764
- "grad_norm": 0.3984375,
1765
  "learning_rate": 9.992030460880181e-05,
1766
- "loss": 0.9712,
1767
  "step": 1205
1768
  },
1769
  {
1770
  "epoch": 8.287671232876713,
1771
- "grad_norm": 0.40234375,
1772
  "learning_rate": 9.91233618326026e-05,
1773
- "loss": 0.9846,
1774
  "step": 1210
1775
  },
1776
  {
1777
  "epoch": 8.321917808219178,
1778
- "grad_norm": 0.4296875,
1779
  "learning_rate": 9.83264747345259e-05,
1780
- "loss": 0.98,
1781
  "step": 1215
1782
  },
1783
  {
1784
  "epoch": 8.356164383561644,
1785
- "grad_norm": 0.423828125,
1786
  "learning_rate": 9.752969392744606e-05,
1787
- "loss": 0.9835,
1788
  "step": 1220
1789
  },
1790
  {
1791
  "epoch": 8.39041095890411,
1792
- "grad_norm": 0.44921875,
1793
  "learning_rate": 9.673307001748661e-05,
1794
- "loss": 0.9739,
1795
  "step": 1225
1796
  },
1797
  {
1798
  "epoch": 8.424657534246576,
1799
- "grad_norm": 0.435546875,
1800
  "learning_rate": 9.593665360080599e-05,
1801
- "loss": 0.9735,
1802
  "step": 1230
1803
  },
1804
  {
1805
  "epoch": 8.45890410958904,
1806
- "grad_norm": 0.42578125,
1807
  "learning_rate": 9.514049526038418e-05,
1808
- "loss": 0.9816,
1809
  "step": 1235
1810
  },
1811
  {
1812
  "epoch": 8.493150684931507,
1813
- "grad_norm": 0.44140625,
1814
  "learning_rate": 9.43446455628097e-05,
1815
- "loss": 0.9813,
1816
  "step": 1240
1817
  },
1818
  {
1819
  "epoch": 8.527397260273972,
1820
- "grad_norm": 0.455078125,
1821
  "learning_rate": 9.354915505506839e-05,
1822
- "loss": 0.9802,
1823
  "step": 1245
1824
  },
1825
  {
1826
  "epoch": 8.561643835616438,
1827
- "grad_norm": 0.43359375,
1828
  "learning_rate": 9.27540742613326e-05,
1829
- "loss": 0.9836,
1830
  "step": 1250
1831
  },
1832
  {
1833
  "epoch": 8.595890410958905,
1834
- "grad_norm": 0.427734375,
1835
  "learning_rate": 9.195945367975256e-05,
1836
- "loss": 0.974,
1837
  "step": 1255
1838
  },
1839
  {
1840
  "epoch": 8.63013698630137,
1841
- "grad_norm": 0.453125,
1842
  "learning_rate": 9.116534377924883e-05,
1843
- "loss": 0.9819,
1844
  "step": 1260
1845
  },
1846
  {
1847
  "epoch": 8.664383561643836,
1848
- "grad_norm": 0.421875,
1849
  "learning_rate": 9.037179499630703e-05,
1850
- "loss": 0.9756,
1851
  "step": 1265
1852
  },
1853
  {
1854
  "epoch": 8.698630136986301,
1855
- "grad_norm": 0.4375,
1856
  "learning_rate": 8.957885773177438e-05,
1857
- "loss": 0.9929,
1858
  "step": 1270
1859
  },
1860
  {
1861
  "epoch": 8.732876712328768,
1862
- "grad_norm": 0.62109375,
1863
  "learning_rate": 8.878658234765858e-05,
1864
- "loss": 0.9788,
1865
  "step": 1275
1866
  },
1867
  {
1868
  "epoch": 8.767123287671232,
1869
- "grad_norm": 0.458984375,
1870
  "learning_rate": 8.799501916392912e-05,
1871
- "loss": 0.9897,
1872
  "step": 1280
1873
  },
1874
  {
1875
  "epoch": 8.801369863013699,
1876
- "grad_norm": 0.40234375,
1877
  "learning_rate": 8.720421845532151e-05,
1878
- "loss": 0.9844,
1879
  "step": 1285
1880
  },
1881
  {
1882
  "epoch": 8.835616438356164,
1883
- "grad_norm": 0.42578125,
1884
  "learning_rate": 8.641423044814374e-05,
1885
- "loss": 0.9809,
1886
  "step": 1290
1887
  },
1888
  {
1889
  "epoch": 8.86986301369863,
1890
- "grad_norm": 0.42578125,
1891
  "learning_rate": 8.562510531708677e-05,
1892
- "loss": 0.9889,
1893
  "step": 1295
1894
  },
1895
  {
1896
  "epoch": 8.904109589041095,
1897
- "grad_norm": 0.40625,
1898
  "learning_rate": 8.48368931820373e-05,
1899
- "loss": 0.9838,
1900
  "step": 1300
1901
  },
1902
  {
1903
  "epoch": 8.938356164383562,
1904
- "grad_norm": 0.41796875,
1905
  "learning_rate": 8.404964410489485e-05,
1906
- "loss": 0.9877,
1907
  "step": 1305
1908
  },
1909
  {
1910
  "epoch": 8.972602739726028,
1911
- "grad_norm": 0.427734375,
1912
  "learning_rate": 8.32634080863919e-05,
1913
- "loss": 0.9865,
1914
  "step": 1310
1915
  },
1916
  {
1917
  "epoch": 9.0,
1918
- "eval_loss": 2.5657296180725098,
1919
- "eval_runtime": 0.5493,
1920
- "eval_samples_per_second": 18.206,
1921
- "eval_steps_per_second": 1.821,
1922
  "step": 1314
1923
  },
1924
  {
1925
  "epoch": 9.006849315068493,
1926
- "grad_norm": 0.41796875,
1927
  "learning_rate": 8.247823506291844e-05,
1928
- "loss": 0.9785,
1929
  "step": 1315
1930
  },
1931
  {
1932
  "epoch": 9.04109589041096,
1933
- "grad_norm": 0.41796875,
1934
  "learning_rate": 8.169417490335007e-05,
1935
- "loss": 0.9678,
1936
  "step": 1320
1937
  },
1938
  {
1939
  "epoch": 9.075342465753424,
1940
- "grad_norm": 0.4609375,
1941
  "learning_rate": 8.091127740588094e-05,
1942
- "loss": 0.9555,
1943
  "step": 1325
1944
  },
1945
  {
1946
  "epoch": 9.10958904109589,
1947
- "grad_norm": 0.431640625,
1948
  "learning_rate": 8.012959229486061e-05,
1949
- "loss": 0.9599,
1950
  "step": 1330
1951
  },
1952
  {
1953
  "epoch": 9.143835616438356,
1954
- "grad_norm": 0.46875,
1955
  "learning_rate": 7.934916921763628e-05,
1956
- "loss": 0.9644,
1957
  "step": 1335
1958
  },
1959
  {
1960
  "epoch": 9.178082191780822,
1961
- "grad_norm": 0.427734375,
1962
  "learning_rate": 7.857005774139907e-05,
1963
- "loss": 0.964,
1964
  "step": 1340
1965
  },
1966
  {
1967
  "epoch": 9.212328767123287,
1968
- "grad_norm": 0.447265625,
1969
  "learning_rate": 7.779230735003628e-05,
1970
- "loss": 0.9671,
1971
  "step": 1345
1972
  },
1973
  {
1974
  "epoch": 9.246575342465754,
1975
- "grad_norm": 0.396484375,
1976
  "learning_rate": 7.701596744098818e-05,
1977
- "loss": 0.962,
1978
  "step": 1350
1979
  },
1980
  {
1981
  "epoch": 9.280821917808218,
1982
- "grad_norm": 0.4140625,
1983
  "learning_rate": 7.624108732211081e-05,
1984
- "loss": 0.9547,
1985
  "step": 1355
1986
  },
1987
  {
1988
  "epoch": 9.315068493150685,
1989
- "grad_norm": 0.4140625,
1990
  "learning_rate": 7.54677162085442e-05,
1991
- "loss": 0.9729,
1992
  "step": 1360
1993
  },
1994
  {
1995
  "epoch": 9.349315068493151,
1996
- "grad_norm": 0.40625,
1997
  "learning_rate": 7.469590321958662e-05,
1998
- "loss": 0.9658,
1999
  "step": 1365
2000
  },
2001
  {
2002
  "epoch": 9.383561643835616,
2003
  "grad_norm": 0.43359375,
2004
  "learning_rate": 7.392569737557474e-05,
2005
- "loss": 0.9712,
2006
  "step": 1370
2007
  },
2008
  {
2009
  "epoch": 9.417808219178083,
2010
- "grad_norm": 0.455078125,
2011
  "learning_rate": 7.31571475947703e-05,
2012
- "loss": 0.9658,
2013
  "step": 1375
2014
  },
2015
  {
2016
  "epoch": 9.452054794520548,
2017
- "grad_norm": 0.416015625,
2018
  "learning_rate": 7.239030269025311e-05,
2019
- "loss": 0.9634,
2020
  "step": 1380
2021
  },
2022
  {
2023
  "epoch": 9.486301369863014,
2024
- "grad_norm": 0.390625,
2025
  "learning_rate": 7.162521136682085e-05,
2026
- "loss": 0.9794,
2027
  "step": 1385
2028
  },
2029
  {
2030
  "epoch": 9.520547945205479,
2031
- "grad_norm": 0.4375,
2032
  "learning_rate": 7.08619222178954e-05,
2033
- "loss": 0.9713,
2034
  "step": 1390
2035
  },
2036
  {
2037
  "epoch": 9.554794520547945,
2038
- "grad_norm": 0.412109375,
2039
  "learning_rate": 7.010048372243698e-05,
2040
- "loss": 0.974,
2041
  "step": 1395
2042
  },
2043
  {
2044
  "epoch": 9.58904109589041,
2045
- "grad_norm": 0.439453125,
2046
  "learning_rate": 6.934094424186459e-05,
2047
- "loss": 0.9682,
2048
  "step": 1400
2049
  },
2050
  {
2051
  "epoch": 9.623287671232877,
2052
- "grad_norm": 0.416015625,
2053
  "learning_rate": 6.858335201698485e-05,
2054
- "loss": 0.9787,
2055
  "step": 1405
2056
  },
2057
  {
2058
  "epoch": 9.657534246575342,
2059
- "grad_norm": 0.4453125,
2060
  "learning_rate": 6.782775516492771e-05,
2061
- "loss": 0.9623,
2062
  "step": 1410
2063
  },
2064
  {
2065
  "epoch": 9.691780821917808,
2066
- "grad_norm": 0.43359375,
2067
  "learning_rate": 6.70742016760907e-05,
2068
- "loss": 0.975,
2069
  "step": 1415
2070
  },
2071
  {
2072
  "epoch": 9.726027397260275,
2073
- "grad_norm": 0.44921875,
2074
  "learning_rate": 6.632273941109064e-05,
2075
- "loss": 0.9571,
2076
  "step": 1420
2077
  },
2078
  {
2079
  "epoch": 9.76027397260274,
2080
- "grad_norm": 0.400390625,
2081
  "learning_rate": 6.5573416097724e-05,
2082
- "loss": 0.9782,
2083
  "step": 1425
2084
  },
2085
  {
2086
  "epoch": 9.794520547945206,
2087
- "grad_norm": 0.451171875,
2088
  "learning_rate": 6.482627932793553e-05,
2089
- "loss": 0.9863,
2090
  "step": 1430
2091
  },
2092
  {
2093
  "epoch": 9.82876712328767,
2094
- "grad_norm": 0.42578125,
2095
  "learning_rate": 6.408137655479554e-05,
2096
- "loss": 0.985,
2097
  "step": 1435
2098
  },
2099
  {
2100
  "epoch": 9.863013698630137,
2101
- "grad_norm": 0.45703125,
2102
  "learning_rate": 6.333875508948593e-05,
2103
- "loss": 0.9727,
2104
  "step": 1440
2105
  },
2106
  {
2107
  "epoch": 9.897260273972602,
2108
- "grad_norm": 0.41796875,
2109
  "learning_rate": 6.259846209829551e-05,
2110
- "loss": 0.9875,
2111
  "step": 1445
2112
  },
2113
  {
2114
  "epoch": 9.931506849315069,
2115
- "grad_norm": 0.453125,
2116
  "learning_rate": 6.186054459962399e-05,
2117
- "loss": 0.9715,
2118
  "step": 1450
2119
  },
2120
  {
2121
  "epoch": 9.965753424657533,
2122
- "grad_norm": 0.451171875,
2123
  "learning_rate": 6.112504946099604e-05,
2124
- "loss": 0.969,
2125
  "step": 1455
2126
  },
2127
  {
2128
  "epoch": 10.0,
2129
- "grad_norm": 0.416015625,
2130
  "learning_rate": 6.039202339608432e-05,
2131
- "loss": 0.9691,
2132
  "step": 1460
2133
  },
2134
  {
2135
  "epoch": 10.0,
2136
- "eval_loss": 2.5773940086364746,
2137
- "eval_runtime": 0.5541,
2138
- "eval_samples_per_second": 18.049,
2139
- "eval_steps_per_second": 1.805,
2140
  "step": 1460
2141
  },
2142
  {
2143
  "epoch": 10.034246575342467,
2144
- "grad_norm": 0.404296875,
2145
  "learning_rate": 5.966151296174268e-05,
2146
- "loss": 0.9569,
2147
  "step": 1465
2148
  },
2149
  {
2150
  "epoch": 10.068493150684931,
2151
- "grad_norm": 0.447265625,
2152
  "learning_rate": 5.8933564555049105e-05,
2153
- "loss": 0.9608,
2154
  "step": 1470
2155
  },
2156
  {
2157
  "epoch": 10.102739726027398,
2158
- "grad_norm": 0.439453125,
2159
  "learning_rate": 5.820822441035899e-05,
2160
- "loss": 0.9571,
2161
  "step": 1475
2162
  },
2163
  {
2164
  "epoch": 10.136986301369863,
2165
- "grad_norm": 0.408203125,
2166
  "learning_rate": 5.7485538596368496e-05,
2167
- "loss": 0.9585,
2168
  "step": 1480
2169
  },
2170
  {
2171
  "epoch": 10.17123287671233,
2172
  "grad_norm": 0.41015625,
2173
  "learning_rate": 5.6765553013188766e-05,
2174
- "loss": 0.956,
2175
  "step": 1485
2176
  },
2177
  {
2178
  "epoch": 10.205479452054794,
2179
- "grad_norm": 0.396484375,
2180
  "learning_rate": 5.6048313389430484e-05,
2181
- "loss": 0.9522,
2182
  "step": 1490
2183
  },
2184
  {
2185
  "epoch": 10.23972602739726,
2186
- "grad_norm": 0.40625,
2187
  "learning_rate": 5.533386527929962e-05,
2188
- "loss": 0.9652,
2189
  "step": 1495
2190
  },
2191
  {
2192
  "epoch": 10.273972602739725,
2193
- "grad_norm": 0.4296875,
2194
  "learning_rate": 5.462225405970401e-05,
2195
- "loss": 0.9611,
2196
  "step": 1500
2197
  },
2198
  {
2199
- "epoch": 10.308219178082192,
2200
- "grad_norm": 0.435546875,
2201
- "learning_rate": 5.391352492737157e-05,
2202
- "loss": 0.956,
2203
- "step": 1505
2204
- },
2205
- {
2206
- "epoch": 10.342465753424657,
2207
- "grad_norm": 0.412109375,
2208
- "learning_rate": 5.3207722895979406e-05,
2209
- "loss": 0.9602,
2210
- "step": 1510
2211
- },
2212
- {
2213
- "epoch": 10.376712328767123,
2214
- "grad_norm": 0.41015625,
2215
- "learning_rate": 5.2504892793295e-05,
2216
- "loss": 0.9607,
2217
- "step": 1515
2218
- },
2219
- {
2220
- "epoch": 10.41095890410959,
2221
- "grad_norm": 0.5,
2222
- "learning_rate": 5.1805079258329056e-05,
2223
- "loss": 0.9658,
2224
- "step": 1520
2225
- },
2226
- {
2227
- "epoch": 10.445205479452055,
2228
- "grad_norm": 0.5078125,
2229
- "learning_rate": 5.110832673850039e-05,
2230
- "loss": 0.9648,
2231
- "step": 1525
2232
- },
2233
- {
2234
- "epoch": 10.479452054794521,
2235
- "grad_norm": 0.451171875,
2236
- "learning_rate": 5.041467948681269e-05,
2237
- "loss": 0.9732,
2238
- "step": 1530
2239
- },
2240
- {
2241
- "epoch": 10.513698630136986,
2242
- "grad_norm": 0.439453125,
2243
- "learning_rate": 4.9724181559044234e-05,
2244
- "loss": 0.9681,
2245
- "step": 1535
2246
- },
2247
- {
2248
- "epoch": 10.547945205479452,
2249
- "grad_norm": 0.404296875,
2250
- "learning_rate": 4.903687681094942e-05,
2251
- "loss": 0.9639,
2252
- "step": 1540
2253
- },
2254
- {
2255
- "epoch": 10.582191780821917,
2256
- "grad_norm": 0.412109375,
2257
- "learning_rate": 4.835280889547351e-05,
2258
- "loss": 0.9579,
2259
- "step": 1545
2260
- },
2261
- {
2262
- "epoch": 10.616438356164384,
2263
- "grad_norm": 0.41015625,
2264
- "learning_rate": 4.767202125998005e-05,
2265
- "loss": 0.9553,
2266
- "step": 1550
2267
- },
2268
- {
2269
- "epoch": 10.650684931506849,
2270
- "grad_norm": 0.41796875,
2271
- "learning_rate": 4.699455714349152e-05,
2272
- "loss": 0.9512,
2273
- "step": 1555
2274
- },
2275
- {
2276
- "epoch": 10.684931506849315,
2277
- "grad_norm": 0.416015625,
2278
- "learning_rate": 4.6320459573942856e-05,
2279
- "loss": 0.968,
2280
- "step": 1560
2281
- },
2282
- {
2283
- "epoch": 10.719178082191782,
2284
- "grad_norm": 0.416015625,
2285
- "learning_rate": 4.564977136544873e-05,
2286
- "loss": 0.9617,
2287
- "step": 1565
2288
- },
2289
- {
2290
- "epoch": 10.753424657534246,
2291
- "grad_norm": 0.421875,
2292
- "learning_rate": 4.49825351155843e-05,
2293
- "loss": 0.9749,
2294
- "step": 1570
2295
- },
2296
- {
2297
- "epoch": 10.787671232876713,
2298
- "grad_norm": 0.44140625,
2299
- "learning_rate": 4.431879320267972e-05,
2300
- "loss": 0.9662,
2301
- "step": 1575
2302
- },
2303
- {
2304
- "epoch": 10.821917808219178,
2305
- "grad_norm": 0.412109375,
2306
- "learning_rate": 4.3658587783128425e-05,
2307
- "loss": 0.9661,
2308
- "step": 1580
2309
- },
2310
- {
2311
- "epoch": 10.856164383561644,
2312
- "grad_norm": 0.4609375,
2313
- "learning_rate": 4.300196078870982e-05,
2314
- "loss": 0.9571,
2315
- "step": 1585
2316
- },
2317
- {
2318
- "epoch": 10.89041095890411,
2319
- "grad_norm": 0.41015625,
2320
- "learning_rate": 4.2348953923925916e-05,
2321
- "loss": 0.9657,
2322
- "step": 1590
2323
- },
2324
- {
2325
- "epoch": 10.924657534246576,
2326
- "grad_norm": 0.43359375,
2327
- "learning_rate": 4.16996086633526e-05,
2328
- "loss": 0.9587,
2329
- "step": 1595
2330
- },
2331
- {
2332
- "epoch": 10.95890410958904,
2333
- "grad_norm": 0.390625,
2334
- "learning_rate": 4.105396624900538e-05,
2335
- "loss": 0.9548,
2336
- "step": 1600
2337
- },
2338
- {
2339
- "epoch": 10.993150684931507,
2340
- "grad_norm": 0.427734375,
2341
- "learning_rate": 4.041206768772022e-05,
2342
- "loss": 0.952,
2343
- "step": 1605
2344
- },
2345
- {
2346
- "epoch": 11.0,
2347
- "eval_loss": 2.5889077186584473,
2348
- "eval_runtime": 0.5538,
2349
- "eval_samples_per_second": 18.057,
2350
- "eval_steps_per_second": 1.806,
2351
- "step": 1606
2352
- },
2353
- {
2354
- "epoch": 11.027397260273972,
2355
- "grad_norm": 0.416015625,
2356
- "learning_rate": 3.977395374854871e-05,
2357
- "loss": 0.9434,
2358
- "step": 1610
2359
- },
2360
- {
2361
- "epoch": 11.061643835616438,
2362
- "grad_norm": 0.419921875,
2363
- "learning_rate": 3.913966496016891e-05,
2364
- "loss": 0.9579,
2365
- "step": 1615
2366
- },
2367
- {
2368
- "epoch": 11.095890410958905,
2369
- "grad_norm": 0.453125,
2370
- "learning_rate": 3.850924160831115e-05,
2371
- "loss": 0.9524,
2372
- "step": 1620
2373
- },
2374
- {
2375
- "epoch": 11.13013698630137,
2376
- "grad_norm": 0.42578125,
2377
- "learning_rate": 3.788272373319955e-05,
2378
- "loss": 0.9618,
2379
- "step": 1625
2380
- },
2381
- {
2382
- "epoch": 11.164383561643836,
2383
- "grad_norm": 0.40625,
2384
- "learning_rate": 3.726015112700859e-05,
2385
- "loss": 0.9503,
2386
- "step": 1630
2387
- },
2388
- {
2389
- "epoch": 11.198630136986301,
2390
- "grad_norm": 0.41796875,
2391
- "learning_rate": 3.6641563331336125e-05,
2392
- "loss": 0.9475,
2393
- "step": 1635
2394
- },
2395
- {
2396
- "epoch": 11.232876712328768,
2397
- "grad_norm": 0.41796875,
2398
- "learning_rate": 3.6026999634691725e-05,
2399
- "loss": 0.945,
2400
- "step": 1640
2401
- },
2402
- {
2403
- "epoch": 11.267123287671232,
2404
- "grad_norm": 0.40234375,
2405
- "learning_rate": 3.541649907000147e-05,
2406
- "loss": 0.9528,
2407
- "step": 1645
2408
- },
2409
- {
2410
- "epoch": 11.301369863013699,
2411
- "grad_norm": 0.423828125,
2412
- "learning_rate": 3.4810100412128747e-05,
2413
- "loss": 0.946,
2414
- "step": 1650
2415
- },
2416
- {
2417
- "epoch": 11.335616438356164,
2418
- "grad_norm": 0.404296875,
2419
- "learning_rate": 3.42078421754117e-05,
2420
- "loss": 0.9535,
2421
- "step": 1655
2422
- },
2423
- {
2424
- "epoch": 11.36986301369863,
2425
- "grad_norm": 0.41015625,
2426
- "learning_rate": 3.360976261121684e-05,
2427
- "loss": 0.962,
2428
- "step": 1660
2429
- },
2430
- {
2431
- "epoch": 11.404109589041095,
2432
- "grad_norm": 0.412109375,
2433
- "learning_rate": 3.3015899705509734e-05,
2434
- "loss": 0.9503,
2435
- "step": 1665
2436
- },
2437
- {
2438
- "epoch": 11.438356164383562,
2439
- "grad_norm": 0.423828125,
2440
- "learning_rate": 3.242629117644229e-05,
2441
- "loss": 0.9564,
2442
- "step": 1670
2443
- },
2444
- {
2445
- "epoch": 11.472602739726028,
2446
- "grad_norm": 0.41796875,
2447
- "learning_rate": 3.184097447195732e-05,
2448
- "loss": 0.9714,
2449
- "step": 1675
2450
- },
2451
- {
2452
- "epoch": 11.506849315068493,
2453
- "grad_norm": 0.404296875,
2454
- "learning_rate": 3.125998676740987e-05,
2455
- "loss": 0.9516,
2456
- "step": 1680
2457
- },
2458
- {
2459
- "epoch": 11.54109589041096,
2460
- "grad_norm": 0.416015625,
2461
- "learning_rate": 3.068336496320631e-05,
2462
- "loss": 0.9634,
2463
- "step": 1685
2464
- },
2465
- {
2466
- "epoch": 11.575342465753424,
2467
- "grad_norm": 0.419921875,
2468
- "learning_rate": 3.0111145682460507e-05,
2469
- "loss": 0.9578,
2470
- "step": 1690
2471
- },
2472
- {
2473
- "epoch": 11.60958904109589,
2474
- "grad_norm": 0.412109375,
2475
- "learning_rate": 2.9543365268667867e-05,
2476
- "loss": 0.9551,
2477
- "step": 1695
2478
- },
2479
- {
2480
- "epoch": 11.643835616438356,
2481
- "grad_norm": 0.404296875,
2482
- "learning_rate": 2.8980059783396953e-05,
2483
- "loss": 0.9585,
2484
- "step": 1700
2485
- },
2486
- {
2487
- "epoch": 11.678082191780822,
2488
- "grad_norm": 0.390625,
2489
- "learning_rate": 2.8421265003999286e-05,
2490
- "loss": 0.9573,
2491
- "step": 1705
2492
- },
2493
- {
2494
- "epoch": 11.712328767123287,
2495
- "grad_norm": 0.4140625,
2496
- "learning_rate": 2.7867016421336776e-05,
2497
- "loss": 0.9544,
2498
- "step": 1710
2499
- },
2500
- {
2501
- "epoch": 11.746575342465754,
2502
- "grad_norm": 0.396484375,
2503
- "learning_rate": 2.7317349237527724e-05,
2504
- "loss": 0.9432,
2505
- "step": 1715
2506
- },
2507
- {
2508
- "epoch": 11.780821917808218,
2509
- "grad_norm": 0.41015625,
2510
- "learning_rate": 2.6772298363710956e-05,
2511
- "loss": 0.9567,
2512
- "step": 1720
2513
- },
2514
- {
2515
- "epoch": 11.815068493150685,
2516
- "grad_norm": 0.431640625,
2517
- "learning_rate": 2.6231898417828603e-05,
2518
- "loss": 0.9667,
2519
- "step": 1725
2520
- },
2521
- {
2522
- "epoch": 11.849315068493151,
2523
- "grad_norm": 0.421875,
2524
- "learning_rate": 2.569618372242727e-05,
2525
- "loss": 0.962,
2526
- "step": 1730
2527
- },
2528
- {
2529
- "epoch": 11.883561643835616,
2530
- "grad_norm": 0.427734375,
2531
- "learning_rate": 2.5165188302478215e-05,
2532
- "loss": 0.9514,
2533
- "step": 1735
2534
- },
2535
- {
2536
- "epoch": 11.917808219178083,
2537
- "grad_norm": 0.435546875,
2538
- "learning_rate": 2.4638945883216235e-05,
2539
- "loss": 0.9446,
2540
- "step": 1740
2541
- },
2542
- {
2543
- "epoch": 11.952054794520548,
2544
- "grad_norm": 0.412109375,
2545
- "learning_rate": 2.411748988799769e-05,
2546
- "loss": 0.9473,
2547
- "step": 1745
2548
- },
2549
- {
2550
- "epoch": 11.986301369863014,
2551
- "grad_norm": 0.4375,
2552
- "learning_rate": 2.3600853436177672e-05,
2553
- "loss": 0.97,
2554
- "step": 1750
2555
- },
2556
- {
2557
- "epoch": 12.0,
2558
- "eval_loss": 2.5957212448120117,
2559
- "eval_runtime": 0.5471,
2560
- "eval_samples_per_second": 18.279,
2561
- "eval_steps_per_second": 1.828,
2562
- "step": 1752
2563
- },
2564
- {
2565
- "epoch": 12.020547945205479,
2566
- "grad_norm": 0.388671875,
2567
- "learning_rate": 2.3089069341006565e-05,
2568
- "loss": 0.9617,
2569
- "step": 1755
2570
- },
2571
- {
2572
- "epoch": 12.054794520547945,
2573
- "grad_norm": 0.447265625,
2574
- "learning_rate": 2.2582170107545852e-05,
2575
- "loss": 0.9448,
2576
- "step": 1760
2577
- },
2578
- {
2579
- "epoch": 12.08904109589041,
2580
- "grad_norm": 0.41015625,
2581
- "learning_rate": 2.2080187930603668e-05,
2582
- "loss": 0.9451,
2583
- "step": 1765
2584
- },
2585
- {
2586
- "epoch": 12.123287671232877,
2587
- "grad_norm": 0.41796875,
2588
- "learning_rate": 2.1583154692689976e-05,
2589
- "loss": 0.9472,
2590
- "step": 1770
2591
- },
2592
- {
2593
- "epoch": 12.157534246575343,
2594
- "grad_norm": 0.412109375,
2595
- "learning_rate": 2.109110196199171e-05,
2596
- "loss": 0.9536,
2597
- "step": 1775
2598
- },
2599
- {
2600
- "epoch": 12.191780821917808,
2601
- "grad_norm": 0.3984375,
2602
- "learning_rate": 2.0604060990367624e-05,
2603
- "loss": 0.9578,
2604
- "step": 1780
2605
- },
2606
- {
2607
- "epoch": 12.226027397260275,
2608
- "grad_norm": 0.412109375,
2609
- "learning_rate": 2.0122062711363532e-05,
2610
- "loss": 0.9515,
2611
- "step": 1785
2612
- },
2613
- {
2614
- "epoch": 12.26027397260274,
2615
- "grad_norm": 0.423828125,
2616
- "learning_rate": 1.9645137738247422e-05,
2617
- "loss": 0.9438,
2618
- "step": 1790
2619
- },
2620
- {
2621
- "epoch": 12.294520547945206,
2622
- "grad_norm": 0.40625,
2623
- "learning_rate": 1.9173316362065384e-05,
2624
- "loss": 0.9516,
2625
- "step": 1795
2626
- },
2627
- {
2628
- "epoch": 12.32876712328767,
2629
- "grad_norm": 0.41796875,
2630
- "learning_rate": 1.8706628549717452e-05,
2631
- "loss": 0.9517,
2632
- "step": 1800
2633
- },
2634
- {
2635
- "epoch": 12.363013698630137,
2636
- "grad_norm": 0.41015625,
2637
- "learning_rate": 1.824510394205453e-05,
2638
- "loss": 0.9475,
2639
- "step": 1805
2640
- },
2641
- {
2642
- "epoch": 12.397260273972602,
2643
- "grad_norm": 0.412109375,
2644
- "learning_rate": 1.7788771851995655e-05,
2645
- "loss": 0.9536,
2646
- "step": 1810
2647
- },
2648
- {
2649
- "epoch": 12.431506849315069,
2650
- "grad_norm": 0.421875,
2651
- "learning_rate": 1.7337661262666294e-05,
2652
- "loss": 0.9551,
2653
- "step": 1815
2654
- },
2655
- {
2656
- "epoch": 12.465753424657533,
2657
- "grad_norm": 0.40625,
2658
- "learning_rate": 1.6891800825557535e-05,
2659
- "loss": 0.9537,
2660
- "step": 1820
2661
- },
2662
- {
2663
- "epoch": 12.5,
2664
- "grad_norm": 0.419921875,
2665
- "learning_rate": 1.6451218858706374e-05,
2666
- "loss": 0.9516,
2667
- "step": 1825
2668
- },
2669
- {
2670
- "epoch": 12.534246575342467,
2671
- "grad_norm": 0.408203125,
2672
- "learning_rate": 1.601594334489702e-05,
2673
- "loss": 0.9535,
2674
- "step": 1830
2675
- },
2676
- {
2677
- "epoch": 12.568493150684931,
2678
- "grad_norm": 0.4375,
2679
- "learning_rate": 1.5586001929883865e-05,
2680
- "loss": 0.9489,
2681
- "step": 1835
2682
- },
2683
- {
2684
- "epoch": 12.602739726027398,
2685
- "grad_norm": 0.41015625,
2686
- "learning_rate": 1.516142192063521e-05,
2687
- "loss": 0.9554,
2688
- "step": 1840
2689
- },
2690
- {
2691
- "epoch": 12.636986301369863,
2692
- "grad_norm": 0.41015625,
2693
- "learning_rate": 1.474223028359939e-05,
2694
- "loss": 0.9518,
2695
- "step": 1845
2696
- },
2697
- {
2698
- "epoch": 12.67123287671233,
2699
- "grad_norm": 0.41015625,
2700
- "learning_rate": 1.4328453642991646e-05,
2701
- "loss": 0.9565,
2702
- "step": 1850
2703
- },
2704
- {
2705
- "epoch": 12.705479452054794,
2706
- "grad_norm": 0.396484375,
2707
- "learning_rate": 1.392011827910341e-05,
2708
- "loss": 0.9538,
2709
- "step": 1855
2710
- },
2711
- {
2712
- "epoch": 12.73972602739726,
2713
- "grad_norm": 0.416015625,
2714
- "learning_rate": 1.3517250126632986e-05,
2715
- "loss": 0.9498,
2716
- "step": 1860
2717
- },
2718
- {
2719
- "epoch": 12.773972602739725,
2720
- "grad_norm": 0.39453125,
2721
- "learning_rate": 1.311987477303842e-05,
2722
- "loss": 0.9468,
2723
- "step": 1865
2724
- },
2725
- {
2726
- "epoch": 12.808219178082192,
2727
- "grad_norm": 0.40234375,
2728
- "learning_rate": 1.2728017456912344e-05,
2729
- "loss": 0.9587,
2730
- "step": 1870
2731
- },
2732
- {
2733
- "epoch": 12.842465753424658,
2734
- "grad_norm": 0.419921875,
2735
- "learning_rate": 1.2341703066379074e-05,
2736
- "loss": 0.9463,
2737
- "step": 1875
2738
- },
2739
- {
2740
- "epoch": 12.876712328767123,
2741
- "grad_norm": 0.3984375,
2742
- "learning_rate": 1.1960956137513701e-05,
2743
- "loss": 0.9506,
2744
- "step": 1880
2745
- },
2746
- {
2747
- "epoch": 12.91095890410959,
2748
- "grad_norm": 0.408203125,
2749
- "learning_rate": 1.158580085278398e-05,
2750
- "loss": 0.9492,
2751
- "step": 1885
2752
- },
2753
- {
2754
- "epoch": 12.945205479452055,
2755
- "grad_norm": 0.416015625,
2756
- "learning_rate": 1.1216261039514087e-05,
2757
- "loss": 0.9569,
2758
- "step": 1890
2759
- },
2760
- {
2761
- "epoch": 12.979452054794521,
2762
- "grad_norm": 0.408203125,
2763
- "learning_rate": 1.0852360168371656e-05,
2764
- "loss": 0.9514,
2765
- "step": 1895
2766
- },
2767
- {
2768
- "epoch": 13.0,
2769
- "eval_loss": 2.598764181137085,
2770
- "eval_runtime": 0.5483,
2771
- "eval_samples_per_second": 18.237,
2772
- "eval_steps_per_second": 1.824,
2773
- "step": 1898
2774
- },
2775
- {
2776
- "epoch": 13.013698630136986,
2777
- "grad_norm": 0.412109375,
2778
- "learning_rate": 1.049412135187675e-05,
2779
- "loss": 0.9556,
2780
- "step": 1900
2781
- },
2782
- {
2783
- "epoch": 13.047945205479452,
2784
- "grad_norm": 0.40234375,
2785
- "learning_rate": 1.0141567342934132e-05,
2786
- "loss": 0.9543,
2787
- "step": 1905
2788
- },
2789
- {
2790
- "epoch": 13.082191780821917,
2791
- "grad_norm": 0.423828125,
2792
- "learning_rate": 9.794720533388024e-06,
2793
- "loss": 0.947,
2794
- "step": 1910
2795
- },
2796
- {
2797
- "epoch": 13.116438356164384,
2798
- "grad_norm": 0.408203125,
2799
- "learning_rate": 9.453602952599982e-06,
2800
- "loss": 0.9567,
2801
- "step": 1915
2802
- },
2803
- {
2804
- "epoch": 13.150684931506849,
2805
- "grad_norm": 0.408203125,
2806
- "learning_rate": 9.118236266049707e-06,
2807
- "loss": 0.956,
2808
- "step": 1920
2809
- },
2810
- {
2811
- "epoch": 13.184931506849315,
2812
- "grad_norm": 0.41015625,
2813
- "learning_rate": 8.788641773959105e-06,
2814
- "loss": 0.9508,
2815
- "step": 1925
2816
- },
2817
- {
2818
- "epoch": 13.219178082191782,
2819
- "grad_norm": 0.419921875,
2820
- "learning_rate": 8.464840409939267e-06,
2821
- "loss": 0.9556,
2822
- "step": 1930
2823
- },
2824
- {
2825
- "epoch": 13.253424657534246,
2826
- "grad_norm": 0.41015625,
2827
- "learning_rate": 8.146852739661105e-06,
2828
- "loss": 0.9577,
2829
- "step": 1935
2830
- },
2831
- {
2832
- "epoch": 13.287671232876713,
2833
- "grad_norm": 0.400390625,
2834
- "learning_rate": 7.834698959548914e-06,
2835
- "loss": 0.9505,
2836
- "step": 1940
2837
- },
2838
- {
2839
- "epoch": 13.321917808219178,
2840
- "grad_norm": 0.404296875,
2841
- "learning_rate": 7.528398895497924e-06,
2842
- "loss": 0.9556,
2843
- "step": 1945
2844
- },
2845
- {
2846
- "epoch": 13.356164383561644,
2847
- "grad_norm": 0.400390625,
2848
- "learning_rate": 7.2279720016148244e-06,
2849
- "loss": 0.9445,
2850
- "step": 1950
2851
- },
2852
- {
2853
- "epoch": 13.39041095890411,
2854
- "grad_norm": 0.40625,
2855
- "learning_rate": 6.933437358982409e-06,
2856
- "loss": 0.9491,
2857
- "step": 1955
2858
- },
2859
- {
2860
- "epoch": 13.424657534246576,
2861
- "grad_norm": 0.3984375,
2862
- "learning_rate": 6.6448136744474474e-06,
2863
- "loss": 0.9528,
2864
- "step": 1960
2865
- },
2866
- {
2867
- "epoch": 13.45890410958904,
2868
- "grad_norm": 0.40234375,
2869
- "learning_rate": 6.36211927943271e-06,
2870
- "loss": 0.9531,
2871
- "step": 1965
2872
- },
2873
- {
2874
- "epoch": 13.493150684931507,
2875
- "grad_norm": 0.43359375,
2876
- "learning_rate": 6.085372128772637e-06,
2877
- "loss": 0.949,
2878
- "step": 1970
2879
- },
2880
- {
2881
- "epoch": 13.527397260273972,
2882
- "grad_norm": 0.408203125,
2883
- "learning_rate": 5.814589799572956e-06,
2884
- "loss": 0.9538,
2885
- "step": 1975
2886
- },
2887
- {
2888
- "epoch": 13.561643835616438,
2889
- "grad_norm": 0.40625,
2890
- "learning_rate": 5.549789490094304e-06,
2891
- "loss": 0.9481,
2892
- "step": 1980
2893
- },
2894
- {
2895
- "epoch": 13.595890410958905,
2896
- "grad_norm": 0.412109375,
2897
- "learning_rate": 5.290988018659937e-06,
2898
- "loss": 0.9523,
2899
- "step": 1985
2900
- },
2901
- {
2902
- "epoch": 13.63013698630137,
2903
- "grad_norm": 0.390625,
2904
- "learning_rate": 5.038201822587474e-06,
2905
- "loss": 0.9483,
2906
- "step": 1990
2907
- },
2908
- {
2909
- "epoch": 13.664383561643836,
2910
- "grad_norm": 0.416015625,
2911
- "learning_rate": 4.79144695714504e-06,
2912
- "loss": 0.9478,
2913
- "step": 1995
2914
- },
2915
- {
2916
- "epoch": 13.698630136986301,
2917
- "grad_norm": 0.404296875,
2918
- "learning_rate": 4.550739094531386e-06,
2919
- "loss": 0.9459,
2920
- "step": 2000
2921
- },
2922
- {
2923
- "epoch": 13.732876712328768,
2924
- "grad_norm": 0.38671875,
2925
- "learning_rate": 4.316093522880648e-06,
2926
- "loss": 0.9474,
2927
- "step": 2005
2928
- },
2929
- {
2930
- "epoch": 13.767123287671232,
2931
- "grad_norm": 0.392578125,
2932
- "learning_rate": 4.087525145291204e-06,
2933
- "loss": 0.9457,
2934
- "step": 2010
2935
- },
2936
- {
2937
- "epoch": 13.801369863013699,
2938
- "grad_norm": 0.408203125,
2939
- "learning_rate": 3.865048478879241e-06,
2940
- "loss": 0.9483,
2941
- "step": 2015
2942
- },
2943
- {
2944
- "epoch": 13.835616438356164,
2945
- "grad_norm": 0.404296875,
2946
- "learning_rate": 3.6486776538566803e-06,
2947
- "loss": 0.9491,
2948
- "step": 2020
2949
- },
2950
- {
2951
- "epoch": 13.86986301369863,
2952
- "grad_norm": 0.41796875,
2953
- "learning_rate": 3.4384264126337328e-06,
2954
- "loss": 0.9497,
2955
- "step": 2025
2956
- },
2957
- {
2958
- "epoch": 13.904109589041095,
2959
- "grad_norm": 0.396484375,
2960
- "learning_rate": 3.2343081089460603e-06,
2961
- "loss": 0.9514,
2962
- "step": 2030
2963
- },
2964
- {
2965
- "epoch": 13.938356164383562,
2966
- "grad_norm": 0.40234375,
2967
- "learning_rate": 3.0363357070066544e-06,
2968
- "loss": 0.9484,
2969
- "step": 2035
2970
- },
2971
- {
2972
- "epoch": 13.972602739726028,
2973
- "grad_norm": 0.431640625,
2974
- "learning_rate": 2.8445217806824077e-06,
2975
- "loss": 0.9469,
2976
- "step": 2040
2977
- },
2978
- {
2979
- "epoch": 14.0,
2980
- "eval_loss": 2.599691867828369,
2981
- "eval_runtime": 0.5385,
2982
- "eval_samples_per_second": 18.57,
2983
- "eval_steps_per_second": 1.857,
2984
- "step": 2044
2985
- },
2986
- {
2987
- "epoch": 14.006849315068493,
2988
- "grad_norm": 0.4296875,
2989
- "learning_rate": 2.658878512695562e-06,
2990
- "loss": 0.9462,
2991
- "step": 2045
2992
- },
2993
- {
2994
- "epoch": 14.04109589041096,
2995
- "grad_norm": 0.41015625,
2996
- "learning_rate": 2.4794176938498837e-06,
2997
- "loss": 0.9509,
2998
- "step": 2050
2999
- },
3000
- {
3001
- "epoch": 14.075342465753424,
3002
- "grad_norm": 0.39453125,
3003
- "learning_rate": 2.30615072228183e-06,
3004
- "loss": 0.9438,
3005
- "step": 2055
3006
- },
3007
- {
3008
- "epoch": 14.10958904109589,
3009
- "grad_norm": 0.400390625,
3010
- "learning_rate": 2.139088602736616e-06,
3011
- "loss": 0.9567,
3012
- "step": 2060
3013
- },
3014
- {
3015
- "epoch": 14.143835616438356,
3016
- "grad_norm": 0.404296875,
3017
- "learning_rate": 1.9782419458692193e-06,
3018
- "loss": 0.9454,
3019
- "step": 2065
3020
- },
3021
- {
3022
- "epoch": 14.178082191780822,
3023
- "grad_norm": 0.404296875,
3024
- "learning_rate": 1.8236209675705274e-06,
3025
- "loss": 0.9486,
3026
- "step": 2070
3027
- },
3028
- {
3029
- "epoch": 14.212328767123287,
3030
- "grad_norm": 0.408203125,
3031
- "learning_rate": 1.6752354883184717e-06,
3032
- "loss": 0.9521,
3033
- "step": 2075
3034
- },
3035
- {
3036
- "epoch": 14.246575342465754,
3037
- "grad_norm": 0.40625,
3038
- "learning_rate": 1.5330949325542797e-06,
3039
- "loss": 0.9556,
3040
- "step": 2080
3041
- },
3042
- {
3043
- "epoch": 14.280821917808218,
3044
- "grad_norm": 0.408203125,
3045
- "learning_rate": 1.397208328083921e-06,
3046
- "loss": 0.9574,
3047
- "step": 2085
3048
- },
3049
- {
3050
- "epoch": 14.315068493150685,
3051
- "grad_norm": 0.427734375,
3052
- "learning_rate": 1.2675843055046765e-06,
3053
- "loss": 0.9557,
3054
- "step": 2090
3055
- },
3056
- {
3057
- "epoch": 14.349315068493151,
3058
- "grad_norm": 0.41015625,
3059
- "learning_rate": 1.144231097657078e-06,
3060
- "loss": 0.952,
3061
- "step": 2095
3062
- },
3063
- {
3064
- "epoch": 14.383561643835616,
3065
- "grad_norm": 0.404296875,
3066
- "learning_rate": 1.0271565391018922e-06,
3067
- "loss": 0.9475,
3068
- "step": 2100
3069
- },
3070
- {
3071
- "epoch": 14.417808219178083,
3072
- "grad_norm": 0.41796875,
3073
- "learning_rate": 9.163680656226303e-07,
3074
- "loss": 0.9548,
3075
- "step": 2105
3076
- },
3077
- {
3078
- "epoch": 14.452054794520548,
3079
- "grad_norm": 0.404296875,
3080
- "learning_rate": 8.118727137532034e-07,
3081
- "loss": 0.9473,
3082
- "step": 2110
3083
- },
3084
- {
3085
- "epoch": 14.486301369863014,
3086
- "grad_norm": 0.396484375,
3087
- "learning_rate": 7.136771203310245e-07,
3088
- "loss": 0.9593,
3089
- "step": 2115
3090
- },
3091
- {
3092
- "epoch": 14.520547945205479,
3093
- "grad_norm": 0.404296875,
3094
- "learning_rate": 6.21787522075512e-07,
3095
- "loss": 0.9442,
3096
- "step": 2120
3097
- },
3098
- {
3099
- "epoch": 14.554794520547945,
3100
- "grad_norm": 0.396484375,
3101
- "learning_rate": 5.362097551919631e-07,
3102
- "loss": 0.9509,
3103
- "step": 2125
3104
- },
3105
- {
3106
- "epoch": 14.58904109589041,
3107
- "grad_norm": 0.408203125,
3108
- "learning_rate": 4.569492550008603e-07,
3109
- "loss": 0.943,
3110
- "step": 2130
3111
- },
3112
- {
3113
- "epoch": 14.623287671232877,
3114
- "grad_norm": 0.408203125,
3115
- "learning_rate": 3.84011055592659e-07,
3116
- "loss": 0.9539,
3117
- "step": 2135
3118
- },
3119
- {
3120
- "epoch": 14.657534246575342,
3121
- "grad_norm": 0.4140625,
3122
- "learning_rate": 3.1739978950806603e-07,
3123
- "loss": 0.9477,
3124
- "step": 2140
3125
- },
3126
- {
3127
- "epoch": 14.691780821917808,
3128
- "grad_norm": 0.404296875,
3129
- "learning_rate": 2.5711968744382974e-07,
3130
- "loss": 0.9532,
3131
- "step": 2145
3132
- },
3133
- {
3134
- "epoch": 14.726027397260275,
3135
- "grad_norm": 0.404296875,
3136
- "learning_rate": 2.0317457798398888e-07,
3137
- "loss": 0.9453,
3138
- "step": 2150
3139
- },
3140
- {
3141
- "epoch": 14.76027397260274,
3142
- "grad_norm": 0.392578125,
3143
- "learning_rate": 1.5556788735676675e-07,
3144
- "loss": 0.9501,
3145
- "step": 2155
3146
- },
3147
- {
3148
- "epoch": 14.794520547945206,
3149
- "grad_norm": 0.423828125,
3150
- "learning_rate": 1.143026392168789e-07,
3151
- "loss": 0.9372,
3152
- "step": 2160
3153
- },
3154
- {
3155
- "epoch": 14.82876712328767,
3156
- "grad_norm": 0.39453125,
3157
- "learning_rate": 7.938145445357536e-08,
3158
- "loss": 0.9602,
3159
- "step": 2165
3160
- },
3161
- {
3162
- "epoch": 14.863013698630137,
3163
- "grad_norm": 0.41015625,
3164
- "learning_rate": 5.0806551024129565e-08,
3165
- "loss": 0.9451,
3166
- "step": 2170
3167
- },
3168
- {
3169
- "epoch": 14.897260273972602,
3170
- "grad_norm": 0.412109375,
3171
- "learning_rate": 2.8579743813006432e-08,
3172
- "loss": 0.9433,
3173
- "step": 2175
3174
- },
3175
- {
3176
- "epoch": 14.931506849315069,
3177
- "grad_norm": 0.408203125,
3178
- "learning_rate": 1.270244451652136e-08,
3179
- "loss": 0.9548,
3180
- "step": 2180
3181
- },
3182
- {
3183
- "epoch": 14.965753424657533,
3184
- "grad_norm": 0.408203125,
3185
- "learning_rate": 3.175661553256326e-09,
3186
- "loss": 0.9541,
3187
- "step": 2185
3188
- },
3189
- {
3190
- "epoch": 15.0,
3191
- "grad_norm": 0.408203125,
3192
- "learning_rate": 0.0,
3193
- "loss": 0.9469,
3194
- "step": 2190
3195
- },
3196
- {
3197
- "epoch": 15.0,
3198
- "eval_loss": 2.5990421772003174,
3199
- "eval_runtime": 0.5553,
3200
- "eval_samples_per_second": 18.009,
3201
- "eval_steps_per_second": 1.801,
3202
- "step": 2190
3203
- },
3204
- {
3205
- "epoch": 15.0,
3206
- "step": 2190,
3207
- "total_flos": 1.2863476116823736e+18,
3208
- "train_loss": 1.080002195214572,
3209
- "train_runtime": 11705.6736,
3210
- "train_samples_per_second": 8.973,
3211
- "train_steps_per_second": 0.187
3212
  }
3213
  ],
3214
  "logging_steps": 5,
3215
- "max_steps": 2190,
3216
  "num_input_tokens_seen": 0,
3217
- "num_train_epochs": 15,
3218
  "save_steps": 100,
3219
  "stateful_callbacks": {
3220
  "TrainerControl": {
@@ -3228,7 +2222,7 @@
3228
  "attributes": {}
3229
  }
3230
  },
3231
- "total_flos": 1.2863476116823736e+18,
3232
  "train_batch_size": 8,
3233
  "trial_name": null,
3234
  "trial_params": null
 
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
+ "epoch": 10.273972602739725,
5
  "eval_steps": 500,
6
+ "global_step": 1500,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
  "epoch": 0.00684931506849315,
13
+ "grad_norm": 3.5625,
14
  "learning_rate": 9.132420091324201e-07,
15
  "loss": 3.0017,
16
  "step": 1
17
  },
18
  {
19
  "epoch": 0.03424657534246575,
20
+ "grad_norm": 2.9375,
21
  "learning_rate": 4.566210045662101e-06,
22
+ "loss": 3.0725,
23
  "step": 5
24
  },
25
  {
26
  "epoch": 0.0684931506849315,
27
+ "grad_norm": 3.078125,
28
  "learning_rate": 9.132420091324201e-06,
29
+ "loss": 3.0374,
30
  "step": 10
31
  },
32
  {
33
  "epoch": 0.10273972602739725,
34
+ "grad_norm": 2.515625,
35
  "learning_rate": 1.3698630136986302e-05,
36
+ "loss": 3.0044,
37
  "step": 15
38
  },
39
  {
40
  "epoch": 0.136986301369863,
41
+ "grad_norm": 2.3125,
42
  "learning_rate": 1.8264840182648402e-05,
43
+ "loss": 2.9373,
44
  "step": 20
45
  },
46
  {
47
  "epoch": 0.17123287671232876,
48
+ "grad_norm": 4.90625,
49
  "learning_rate": 2.2831050228310503e-05,
50
+ "loss": 2.7849,
51
  "step": 25
52
  },
53
  {
54
  "epoch": 0.2054794520547945,
55
+ "grad_norm": 17.0,
56
  "learning_rate": 2.7397260273972603e-05,
57
+ "loss": 2.6263,
58
  "step": 30
59
  },
60
  {
61
  "epoch": 0.23972602739726026,
62
+ "grad_norm": 1.0859375,
63
  "learning_rate": 3.1963470319634704e-05,
64
+ "loss": 2.4603,
65
  "step": 35
66
  },
67
  {
68
  "epoch": 0.273972602739726,
69
+ "grad_norm": 1.75,
70
  "learning_rate": 3.6529680365296805e-05,
71
+ "loss": 2.3423,
72
  "step": 40
73
  },
74
  {
75
  "epoch": 0.3082191780821918,
76
+ "grad_norm": 3.0,
77
  "learning_rate": 4.1095890410958905e-05,
78
+ "loss": 2.2364,
79
  "step": 45
80
  },
81
  {
82
  "epoch": 0.3424657534246575,
83
+ "grad_norm": 1.0546875,
84
  "learning_rate": 4.5662100456621006e-05,
85
+ "loss": 2.0795,
86
  "step": 50
87
  },
88
  {
89
  "epoch": 0.3767123287671233,
90
+ "grad_norm": 1.734375,
91
  "learning_rate": 5.0228310502283106e-05,
92
+ "loss": 1.9497,
93
  "step": 55
94
  },
95
  {
96
  "epoch": 0.410958904109589,
97
+ "grad_norm": 1.25,
98
  "learning_rate": 5.479452054794521e-05,
99
+ "loss": 1.8556,
100
  "step": 60
101
  },
102
  {
103
  "epoch": 0.4452054794520548,
104
+ "grad_norm": 0.640625,
105
  "learning_rate": 5.936073059360731e-05,
106
+ "loss": 1.759,
107
  "step": 65
108
  },
109
  {
110
  "epoch": 0.4794520547945205,
111
+ "grad_norm": 0.97265625,
112
  "learning_rate": 6.392694063926941e-05,
113
+ "loss": 1.6773,
114
  "step": 70
115
  },
116
  {
117
  "epoch": 0.5136986301369864,
118
+ "grad_norm": 1.9296875,
119
  "learning_rate": 6.84931506849315e-05,
120
+ "loss": 1.6105,
121
  "step": 75
122
  },
123
  {
124
  "epoch": 0.547945205479452,
125
+ "grad_norm": 0.51171875,
126
  "learning_rate": 7.305936073059361e-05,
127
+ "loss": 1.5517,
128
  "step": 80
129
  },
130
  {
131
  "epoch": 0.5821917808219178,
132
+ "grad_norm": 0.45703125,
133
  "learning_rate": 7.76255707762557e-05,
134
+ "loss": 1.4895,
135
  "step": 85
136
  },
137
  {
138
  "epoch": 0.6164383561643836,
139
+ "grad_norm": 0.326171875,
140
  "learning_rate": 8.219178082191781e-05,
141
+ "loss": 1.466,
142
  "step": 90
143
  },
144
  {
145
  "epoch": 0.6506849315068494,
146
+ "grad_norm": 0.283203125,
147
  "learning_rate": 8.67579908675799e-05,
148
+ "loss": 1.4237,
149
  "step": 95
150
  },
151
  {
152
  "epoch": 0.684931506849315,
153
+ "grad_norm": 0.333984375,
154
  "learning_rate": 9.132420091324201e-05,
155
+ "loss": 1.3836,
156
  "step": 100
157
  },
158
  {
159
  "epoch": 0.7191780821917808,
160
+ "grad_norm": 0.578125,
161
  "learning_rate": 9.58904109589041e-05,
162
+ "loss": 1.3655,
163
  "step": 105
164
  },
165
  {
166
  "epoch": 0.7534246575342466,
167
+ "grad_norm": 0.484375,
168
  "learning_rate": 0.00010045662100456621,
169
+ "loss": 1.3369,
170
  "step": 110
171
  },
172
  {
173
  "epoch": 0.7876712328767124,
174
+ "grad_norm": 0.3671875,
175
  "learning_rate": 0.00010502283105022832,
176
+ "loss": 1.3149,
177
  "step": 115
178
  },
179
  {
180
  "epoch": 0.821917808219178,
181
+ "grad_norm": 0.9765625,
182
  "learning_rate": 0.00010958904109589041,
183
+ "loss": 1.3051,
184
  "step": 120
185
  },
186
  {
187
  "epoch": 0.8561643835616438,
188
+ "grad_norm": 0.74609375,
189
  "learning_rate": 0.00011415525114155252,
190
+ "loss": 1.2835,
191
  "step": 125
192
  },
193
  {
194
  "epoch": 0.8904109589041096,
195
+ "grad_norm": 0.271484375,
196
  "learning_rate": 0.00011872146118721462,
197
+ "loss": 1.2805,
198
  "step": 130
199
  },
200
  {
201
  "epoch": 0.9246575342465754,
202
+ "grad_norm": 0.82421875,
203
  "learning_rate": 0.0001232876712328767,
204
+ "loss": 1.2617,
205
  "step": 135
206
  },
207
  {
208
  "epoch": 0.958904109589041,
209
+ "grad_norm": 0.498046875,
210
  "learning_rate": 0.00012785388127853882,
211
+ "loss": 1.2659,
212
  "step": 140
213
  },
214
  {
215
  "epoch": 0.9931506849315068,
216
+ "grad_norm": 0.28125,
217
  "learning_rate": 0.00013242009132420092,
218
+ "loss": 1.2474,
219
  "step": 145
220
  },
221
  {
222
  "epoch": 1.0,
223
+ "eval_loss": 2.523677110671997,
224
+ "eval_runtime": 0.5573,
225
+ "eval_samples_per_second": 17.944,
226
+ "eval_steps_per_second": 1.794,
227
  "step": 146
228
  },
229
  {
230
  "epoch": 1.0273972602739727,
231
+ "grad_norm": 0.58984375,
232
  "learning_rate": 0.000136986301369863,
233
+ "loss": 1.2351,
234
  "step": 150
235
  },
236
  {
237
  "epoch": 1.0616438356164384,
238
+ "grad_norm": 0.5234375,
239
  "learning_rate": 0.0001415525114155251,
240
+ "loss": 1.2256,
241
  "step": 155
242
  },
243
  {
244
  "epoch": 1.095890410958904,
245
+ "grad_norm": 0.55859375,
246
  "learning_rate": 0.00014611872146118722,
247
+ "loss": 1.2203,
248
  "step": 160
249
  },
250
  {
251
  "epoch": 1.13013698630137,
252
+ "grad_norm": 0.35546875,
253
  "learning_rate": 0.00015068493150684933,
254
+ "loss": 1.1994,
255
  "step": 165
256
  },
257
  {
258
  "epoch": 1.1643835616438356,
259
+ "grad_norm": 0.345703125,
260
  "learning_rate": 0.0001552511415525114,
261
+ "loss": 1.2069,
262
  "step": 170
263
  },
264
  {
265
  "epoch": 1.1986301369863013,
266
+ "grad_norm": 0.412109375,
267
  "learning_rate": 0.00015981735159817351,
268
+ "loss": 1.1912,
269
  "step": 175
270
  },
271
  {
272
  "epoch": 1.2328767123287672,
273
+ "grad_norm": 0.365234375,
274
  "learning_rate": 0.00016438356164383562,
275
+ "loss": 1.1879,
276
  "step": 180
277
  },
278
  {
279
  "epoch": 1.2671232876712328,
280
+ "grad_norm": 0.42578125,
281
  "learning_rate": 0.00016894977168949773,
282
+ "loss": 1.1983,
283
  "step": 185
284
  },
285
  {
286
  "epoch": 1.3013698630136985,
287
+ "grad_norm": 0.63671875,
288
  "learning_rate": 0.0001735159817351598,
289
+ "loss": 1.1872,
290
  "step": 190
291
  },
292
  {
293
  "epoch": 1.3356164383561644,
294
+ "grad_norm": 0.376953125,
295
  "learning_rate": 0.00017808219178082192,
296
+ "loss": 1.1806,
297
  "step": 195
298
  },
299
  {
300
  "epoch": 1.36986301369863,
301
+ "grad_norm": 1.1640625,
302
  "learning_rate": 0.00018264840182648402,
303
+ "loss": 1.1849,
304
  "step": 200
305
  },
306
  {
307
  "epoch": 1.404109589041096,
308
+ "grad_norm": 1.046875,
309
  "learning_rate": 0.00018721461187214613,
310
+ "loss": 1.1782,
311
  "step": 205
312
  },
313
  {
314
  "epoch": 1.4383561643835616,
315
+ "grad_norm": 0.373046875,
316
  "learning_rate": 0.0001917808219178082,
317
+ "loss": 1.1727,
318
  "step": 210
319
  },
320
  {
321
  "epoch": 1.4726027397260273,
322
+ "grad_norm": 0.482421875,
323
  "learning_rate": 0.00019634703196347032,
324
+ "loss": 1.1725,
325
  "step": 215
326
  },
327
  {
328
  "epoch": 1.5068493150684932,
329
+ "grad_norm": 0.80859375,
330
  "learning_rate": 0.00019999987297289245,
331
+ "loss": 1.1611,
332
  "step": 220
333
  },
334
  {
335
  "epoch": 1.541095890410959,
336
+ "grad_norm": 0.56640625,
337
  "learning_rate": 0.00019999542705801296,
338
+ "loss": 1.1642,
339
  "step": 225
340
  },
341
  {
342
  "epoch": 1.5753424657534247,
343
+ "grad_norm": 0.361328125,
344
  "learning_rate": 0.00019998463011046926,
345
+ "loss": 1.1608,
346
  "step": 230
347
  },
348
  {
349
  "epoch": 1.6095890410958904,
350
+ "grad_norm": 0.76953125,
351
  "learning_rate": 0.00019996748281601038,
352
+ "loss": 1.1563,
353
  "step": 235
354
  },
355
  {
356
  "epoch": 1.643835616438356,
357
+ "grad_norm": 0.388671875,
358
  "learning_rate": 0.00019994398626371643,
359
+ "loss": 1.1457,
360
  "step": 240
361
  },
362
  {
363
  "epoch": 1.678082191780822,
364
+ "grad_norm": 0.45703125,
365
  "learning_rate": 0.0001999141419459293,
366
+ "loss": 1.1609,
367
  "step": 245
368
  },
369
  {
370
  "epoch": 1.7123287671232876,
371
+ "grad_norm": 0.70703125,
372
  "learning_rate": 0.00019987795175815807,
373
+ "loss": 1.1479,
374
  "step": 250
375
  },
376
  {
377
  "epoch": 1.7465753424657535,
378
+ "grad_norm": 0.451171875,
379
  "learning_rate": 0.0001998354179989585,
380
+ "loss": 1.148,
381
  "step": 255
382
  },
383
  {
384
  "epoch": 1.7808219178082192,
385
+ "grad_norm": 0.421875,
386
  "learning_rate": 0.0001997865433697871,
387
+ "loss": 1.1513,
388
  "step": 260
389
  },
390
  {
391
  "epoch": 1.8150684931506849,
392
+ "grad_norm": 0.64453125,
393
  "learning_rate": 0.00019973133097482947,
394
+ "loss": 1.1327,
395
  "step": 265
396
  },
397
  {
398
  "epoch": 1.8493150684931505,
399
+ "grad_norm": 0.326171875,
400
  "learning_rate": 0.00019966978432080316,
401
+ "loss": 1.1424,
402
  "step": 270
403
  },
404
  {
405
  "epoch": 1.8835616438356164,
406
+ "grad_norm": 0.4375,
407
  "learning_rate": 0.00019960190731673505,
408
+ "loss": 1.1387,
409
  "step": 275
410
  },
411
  {
412
  "epoch": 1.9178082191780823,
413
+ "grad_norm": 0.34765625,
414
  "learning_rate": 0.00019952770427371304,
415
+ "loss": 1.1258,
416
  "step": 280
417
  },
418
  {
419
  "epoch": 1.952054794520548,
420
+ "grad_norm": 0.447265625,
421
  "learning_rate": 0.00019944717990461207,
422
+ "loss": 1.1226,
423
  "step": 285
424
  },
425
  {
426
  "epoch": 1.9863013698630136,
427
+ "grad_norm": 0.427734375,
428
  "learning_rate": 0.00019936033932379504,
429
+ "loss": 1.1269,
430
  "step": 290
431
  },
432
  {
433
  "epoch": 2.0,
434
+ "eval_loss": 2.4804677963256836,
435
+ "eval_runtime": 0.5614,
436
+ "eval_samples_per_second": 17.814,
437
+ "eval_steps_per_second": 1.781,
438
  "step": 292
439
  },
440
  {
441
  "epoch": 2.0205479452054793,
442
+ "grad_norm": 0.4609375,
443
  "learning_rate": 0.00019926718804678785,
444
+ "loss": 1.1225,
445
  "step": 295
446
  },
447
  {
448
  "epoch": 2.0547945205479454,
449
+ "grad_norm": 0.435546875,
450
  "learning_rate": 0.000199167731989929,
451
+ "loss": 1.1022,
452
  "step": 300
453
  },
454
  {
455
  "epoch": 2.089041095890411,
456
+ "grad_norm": 0.4140625,
457
  "learning_rate": 0.00019906197746999408,
458
+ "loss": 1.1012,
459
  "step": 305
460
  },
461
  {
462
  "epoch": 2.1232876712328768,
463
+ "grad_norm": 0.3515625,
464
  "learning_rate": 0.00019894993120379435,
465
+ "loss": 1.0928,
466
  "step": 310
467
  },
468
  {
469
  "epoch": 2.1575342465753424,
470
+ "grad_norm": 0.43359375,
471
  "learning_rate": 0.00019883160030775016,
472
+ "loss": 1.1032,
473
  "step": 315
474
  },
475
  {
476
  "epoch": 2.191780821917808,
477
+ "grad_norm": 0.96484375,
478
  "learning_rate": 0.00019870699229743911,
479
+ "loss": 1.0966,
480
  "step": 320
481
  },
482
  {
483
  "epoch": 2.2260273972602738,
484
+ "grad_norm": 0.81640625,
485
  "learning_rate": 0.0001985761150871185,
486
+ "loss": 1.0952,
487
  "step": 325
488
  },
489
  {
490
  "epoch": 2.26027397260274,
491
+ "grad_norm": 0.462890625,
492
  "learning_rate": 0.00019843897698922284,
493
+ "loss": 1.0936,
494
  "step": 330
495
  },
496
  {
497
  "epoch": 2.2945205479452055,
498
+ "grad_norm": 0.3203125,
499
  "learning_rate": 0.00019829558671383585,
500
+ "loss": 1.0938,
501
  "step": 335
502
  },
503
  {
504
  "epoch": 2.328767123287671,
505
+ "grad_norm": 0.494140625,
506
  "learning_rate": 0.00019814595336813725,
507
+ "loss": 1.0856,
508
  "step": 340
509
  },
510
  {
511
  "epoch": 2.363013698630137,
512
+ "grad_norm": 0.353515625,
513
  "learning_rate": 0.0001979900864558242,
514
+ "loss": 1.0851,
515
  "step": 345
516
  },
517
  {
518
  "epoch": 2.3972602739726026,
519
+ "grad_norm": 0.3359375,
520
  "learning_rate": 0.00019782799587650805,
521
+ "loss": 1.1018,
522
  "step": 350
523
  },
524
  {
525
  "epoch": 2.4315068493150687,
526
+ "grad_norm": 0.39453125,
527
  "learning_rate": 0.00019765969192508508,
528
+ "loss": 1.0882,
529
  "step": 355
530
  },
531
  {
532
  "epoch": 2.4657534246575343,
533
+ "grad_norm": 0.341796875,
534
  "learning_rate": 0.00019748518529108316,
535
+ "loss": 1.0932,
536
  "step": 360
537
  },
538
  {
539
  "epoch": 2.5,
540
+ "grad_norm": 0.404296875,
541
  "learning_rate": 0.00019730448705798239,
542
+ "loss": 1.0945,
543
  "step": 365
544
  },
545
  {
546
  "epoch": 2.5342465753424657,
547
+ "grad_norm": 0.35546875,
548
  "learning_rate": 0.00019711760870251143,
549
+ "loss": 1.0881,
550
  "step": 370
551
  },
552
  {
553
  "epoch": 2.5684931506849313,
554
+ "grad_norm": 0.40234375,
555
  "learning_rate": 0.00019692456209391846,
556
+ "loss": 1.0802,
557
  "step": 375
558
  },
559
  {
560
  "epoch": 2.602739726027397,
561
+ "grad_norm": 0.52734375,
562
  "learning_rate": 0.0001967253594932173,
563
+ "loss": 1.0822,
564
  "step": 380
565
  },
566
  {
567
  "epoch": 2.636986301369863,
568
+ "grad_norm": 0.337890625,
569
  "learning_rate": 0.00019652001355240878,
570
+ "loss": 1.0907,
571
  "step": 385
572
  },
573
  {
574
  "epoch": 2.671232876712329,
575
+ "grad_norm": 0.373046875,
576
  "learning_rate": 0.00019630853731367713,
577
+ "loss": 1.0868,
578
  "step": 390
579
  },
580
  {
581
  "epoch": 2.7054794520547945,
582
+ "grad_norm": 0.40234375,
583
  "learning_rate": 0.0001960909442085615,
584
+ "loss": 1.086,
585
  "step": 395
586
  },
587
  {
588
  "epoch": 2.73972602739726,
589
+ "grad_norm": 0.384765625,
590
  "learning_rate": 0.00019586724805710306,
591
+ "loss": 1.0746,
592
  "step": 400
593
  },
594
  {
595
  "epoch": 2.7739726027397262,
596
+ "grad_norm": 0.353515625,
597
  "learning_rate": 0.0001956374630669672,
598
+ "loss": 1.0832,
599
  "step": 405
600
  },
601
  {
602
  "epoch": 2.808219178082192,
603
+ "grad_norm": 0.34765625,
604
  "learning_rate": 0.00019540160383254107,
605
+ "loss": 1.0753,
606
  "step": 410
607
  },
608
  {
609
  "epoch": 2.8424657534246576,
610
+ "grad_norm": 0.328125,
611
  "learning_rate": 0.00019515968533400673,
612
+ "loss": 1.0844,
613
  "step": 415
614
  },
615
  {
616
  "epoch": 2.8767123287671232,
617
+ "grad_norm": 0.34765625,
618
  "learning_rate": 0.00019491172293638968,
619
+ "loss": 1.083,
620
  "step": 420
621
  },
622
  {
623
  "epoch": 2.910958904109589,
624
+ "grad_norm": 0.369140625,
625
  "learning_rate": 0.00019465773238858298,
626
+ "loss": 1.0757,
627
  "step": 425
628
  },
629
  {
630
  "epoch": 2.9452054794520546,
631
+ "grad_norm": 0.56640625,
632
  "learning_rate": 0.00019439772982234697,
633
+ "loss": 1.075,
634
  "step": 430
635
  },
636
  {
637
  "epoch": 2.9794520547945207,
638
+ "grad_norm": 3.71875,
639
  "learning_rate": 0.00019413173175128473,
640
+ "loss": 1.0909,
641
  "step": 435
642
  },
643
  {
644
  "epoch": 3.0,
645
+ "eval_loss": 2.4892916679382324,
646
+ "eval_runtime": 0.5522,
647
+ "eval_samples_per_second": 18.108,
648
+ "eval_steps_per_second": 1.811,
649
  "step": 438
650
  },
651
  {
652
  "epoch": 3.0136986301369864,
653
+ "grad_norm": 1.3046875,
654
  "learning_rate": 0.0001938597550697932,
655
+ "loss": 1.0635,
656
  "step": 440
657
  },
658
  {
659
  "epoch": 3.047945205479452,
660
+ "grad_norm": 0.3984375,
661
  "learning_rate": 0.00019358181705199015,
662
+ "loss": 1.0518,
663
  "step": 445
664
  },
665
  {
666
  "epoch": 3.0821917808219177,
667
+ "grad_norm": 0.369140625,
668
  "learning_rate": 0.00019329793535061723,
669
+ "loss": 1.0509,
670
  "step": 450
671
  },
672
  {
673
  "epoch": 3.1164383561643834,
674
+ "grad_norm": 0.412109375,
675
  "learning_rate": 0.00019300812799591846,
676
+ "loss": 1.0529,
677
  "step": 455
678
  },
679
  {
680
  "epoch": 3.1506849315068495,
681
+ "grad_norm": 0.66015625,
682
  "learning_rate": 0.00019271241339449536,
683
+ "loss": 1.0416,
684
  "step": 460
685
  },
686
  {
687
  "epoch": 3.184931506849315,
688
+ "grad_norm": 0.89453125,
689
  "learning_rate": 0.00019241081032813772,
690
+ "loss": 1.0488,
691
  "step": 465
692
  },
693
  {
694
  "epoch": 3.219178082191781,
695
+ "grad_norm": 0.55078125,
696
  "learning_rate": 0.00019210333795263075,
697
+ "loss": 1.0402,
698
  "step": 470
699
  },
700
  {
701
  "epoch": 3.2534246575342465,
702
+ "grad_norm": 0.73046875,
703
  "learning_rate": 0.00019179001579653853,
704
+ "loss": 1.0568,
705
  "step": 475
706
  },
707
  {
708
  "epoch": 3.287671232876712,
709
+ "grad_norm": 1.0390625,
710
  "learning_rate": 0.0001914708637599636,
711
+ "loss": 1.0487,
712
  "step": 480
713
  },
714
  {
715
  "epoch": 3.3219178082191783,
716
+ "grad_norm": 0.400390625,
717
  "learning_rate": 0.00019114590211328288,
718
+ "loss": 1.0468,
719
  "step": 485
720
  },
721
  {
722
  "epoch": 3.356164383561644,
723
+ "grad_norm": 0.439453125,
724
  "learning_rate": 0.0001908151514958606,
725
+ "loss": 1.0538,
726
  "step": 490
727
  },
728
  {
729
  "epoch": 3.3904109589041096,
730
+ "grad_norm": 0.37109375,
731
  "learning_rate": 0.00019047863291473717,
732
+ "loss": 1.0441,
733
  "step": 495
734
  },
735
  {
736
  "epoch": 3.4246575342465753,
737
+ "grad_norm": 0.34765625,
738
  "learning_rate": 0.00019013636774329495,
739
+ "loss": 1.0521,
740
  "step": 500
741
  },
742
  {
743
  "epoch": 3.458904109589041,
744
+ "grad_norm": 0.4140625,
745
  "learning_rate": 0.00018978837771990085,
746
+ "loss": 1.0405,
747
  "step": 505
748
  },
749
  {
750
  "epoch": 3.493150684931507,
751
+ "grad_norm": 0.4375,
752
  "learning_rate": 0.0001894346849465257,
753
+ "loss": 1.0439,
754
  "step": 510
755
  },
756
  {
757
  "epoch": 3.5273972602739727,
758
+ "grad_norm": 0.349609375,
759
  "learning_rate": 0.00018907531188734026,
760
+ "loss": 1.0525,
761
  "step": 515
762
  },
763
  {
764
  "epoch": 3.5616438356164384,
765
+ "grad_norm": 0.47265625,
766
  "learning_rate": 0.00018871028136728874,
767
+ "loss": 1.0493,
768
  "step": 520
769
  },
770
  {
771
  "epoch": 3.595890410958904,
772
+ "grad_norm": 0.35546875,
773
  "learning_rate": 0.00018833961657063885,
774
+ "loss": 1.0405,
775
  "step": 525
776
  },
777
  {
778
  "epoch": 3.6301369863013697,
779
+ "grad_norm": 0.50390625,
780
  "learning_rate": 0.0001879633410395095,
781
+ "loss": 1.0452,
782
  "step": 530
783
  },
784
  {
785
  "epoch": 3.6643835616438354,
786
+ "grad_norm": 0.34765625,
787
  "learning_rate": 0.00018758147867237548,
788
+ "loss": 1.0515,
789
  "step": 535
790
  },
791
  {
792
  "epoch": 3.6986301369863015,
793
+ "grad_norm": 0.421875,
794
  "learning_rate": 0.00018719405372254948,
795
+ "loss": 1.0453,
796
  "step": 540
797
  },
798
  {
799
  "epoch": 3.732876712328767,
800
+ "grad_norm": 0.3359375,
801
  "learning_rate": 0.00018680109079664188,
802
+ "loss": 1.0356,
803
  "step": 545
804
  },
805
  {
806
  "epoch": 3.767123287671233,
807
+ "grad_norm": 0.333984375,
808
  "learning_rate": 0.0001864026148529978,
809
+ "loss": 1.0355,
810
  "step": 550
811
  },
812
  {
813
  "epoch": 3.8013698630136985,
814
+ "grad_norm": 0.427734375,
815
  "learning_rate": 0.00018599865120011192,
816
+ "loss": 1.0452,
817
  "step": 555
818
  },
819
  {
820
  "epoch": 3.8356164383561646,
821
+ "grad_norm": 0.34375,
822
  "learning_rate": 0.00018558922549502107,
823
+ "loss": 1.0258,
824
  "step": 560
825
  },
826
  {
827
  "epoch": 3.8698630136986303,
828
+ "grad_norm": 0.412109375,
829
  "learning_rate": 0.0001851743637416747,
830
+ "loss": 1.0423,
831
  "step": 565
832
  },
833
  {
834
  "epoch": 3.904109589041096,
835
+ "grad_norm": 0.31640625,
836
  "learning_rate": 0.00018475409228928312,
837
+ "loss": 1.0238,
838
  "step": 570
839
  },
840
  {
841
  "epoch": 3.9383561643835616,
842
+ "grad_norm": 0.400390625,
843
  "learning_rate": 0.00018432843783064429,
844
+ "loss": 1.041,
845
  "step": 575
846
  },
847
  {
848
  "epoch": 3.9726027397260273,
849
+ "grad_norm": 0.412109375,
850
  "learning_rate": 0.00018389742740044813,
851
+ "loss": 1.0354,
852
  "step": 580
853
  },
854
  {
855
  "epoch": 4.0,
856
+ "eval_loss": 2.5017333030700684,
857
+ "eval_runtime": 0.5568,
858
+ "eval_samples_per_second": 17.961,
859
+ "eval_steps_per_second": 1.796,
860
  "step": 584
861
  },
862
  {
863
  "epoch": 4.006849315068493,
864
+ "grad_norm": 0.52734375,
865
  "learning_rate": 0.00018346108837355972,
866
+ "loss": 1.0411,
867
  "step": 585
868
  },
869
  {
870
  "epoch": 4.041095890410959,
871
+ "grad_norm": 0.41796875,
872
  "learning_rate": 0.00018301944846328049,
873
+ "loss": 0.9963,
874
  "step": 590
875
  },
876
  {
877
  "epoch": 4.075342465753424,
878
+ "grad_norm": 0.36328125,
879
  "learning_rate": 0.0001825725357195881,
880
+ "loss": 1.0137,
881
  "step": 595
882
  },
883
  {
884
  "epoch": 4.109589041095891,
885
+ "grad_norm": 0.48046875,
886
  "learning_rate": 0.00018212037852735486,
887
+ "loss": 1.006,
888
  "step": 600
889
  },
890
  {
891
  "epoch": 4.1438356164383565,
892
+ "grad_norm": 0.4140625,
893
  "learning_rate": 0.0001816630056045451,
894
+ "loss": 1.0075,
895
  "step": 605
896
  },
897
  {
898
  "epoch": 4.178082191780822,
899
+ "grad_norm": 0.353515625,
900
  "learning_rate": 0.0001812004460003909,
901
+ "loss": 0.9975,
902
  "step": 610
903
  },
904
  {
905
  "epoch": 4.212328767123288,
906
+ "grad_norm": 0.365234375,
907
  "learning_rate": 0.00018073272909354727,
908
+ "loss": 1.0171,
909
  "step": 615
910
  },
911
  {
912
  "epoch": 4.2465753424657535,
913
+ "grad_norm": 0.51171875,
914
  "learning_rate": 0.0001802598845902262,
915
+ "loss": 0.9953,
916
  "step": 620
917
  },
918
  {
919
  "epoch": 4.280821917808219,
920
+ "grad_norm": 0.38671875,
921
  "learning_rate": 0.00017978194252230985,
922
+ "loss": 1.008,
923
  "step": 625
924
  },
925
  {
926
  "epoch": 4.315068493150685,
927
+ "grad_norm": 0.359375,
928
  "learning_rate": 0.00017929893324544332,
929
+ "loss": 0.9993,
930
  "step": 630
931
  },
932
  {
933
  "epoch": 4.3493150684931505,
934
+ "grad_norm": 0.56640625,
935
  "learning_rate": 0.0001788108874371063,
936
+ "loss": 1.0119,
937
  "step": 635
938
  },
939
  {
940
  "epoch": 4.383561643835616,
941
+ "grad_norm": 0.33203125,
942
  "learning_rate": 0.00017831783609466504,
943
+ "loss": 1.0047,
944
  "step": 640
945
  },
946
  {
947
  "epoch": 4.417808219178082,
948
+ "grad_norm": 0.341796875,
949
  "learning_rate": 0.00017781981053340337,
950
+ "loss": 1.0143,
951
  "step": 645
952
  },
953
  {
954
  "epoch": 4.4520547945205475,
955
+ "grad_norm": 0.345703125,
956
  "learning_rate": 0.00017731684238453385,
957
+ "loss": 1.0023,
958
  "step": 650
959
  },
960
  {
961
  "epoch": 4.486301369863014,
962
+ "grad_norm": 0.37890625,
963
  "learning_rate": 0.0001768089635931887,
964
+ "loss": 1.0125,
965
  "step": 655
966
  },
967
  {
968
  "epoch": 4.52054794520548,
969
+ "grad_norm": 0.609375,
970
  "learning_rate": 0.00017629620641639103,
971
+ "loss": 1.0074,
972
  "step": 660
973
  },
974
  {
975
  "epoch": 4.554794520547945,
976
+ "grad_norm": 0.36328125,
977
  "learning_rate": 0.00017577860342100579,
978
+ "loss": 1.0124,
979
  "step": 665
980
  },
981
  {
982
  "epoch": 4.589041095890411,
983
+ "grad_norm": 0.65625,
984
  "learning_rate": 0.0001752561874816717,
985
+ "loss": 1.015,
986
  "step": 670
987
  },
988
  {
989
  "epoch": 4.623287671232877,
990
+ "grad_norm": 0.38671875,
991
  "learning_rate": 0.00017472899177871297,
992
+ "loss": 1.0066,
993
  "step": 675
994
  },
995
  {
996
  "epoch": 4.657534246575342,
997
+ "grad_norm": 0.32421875,
998
  "learning_rate": 0.00017419704979603214,
999
+ "loss": 1.0182,
1000
  "step": 680
1001
  },
1002
  {
1003
  "epoch": 4.691780821917808,
1004
+ "grad_norm": 0.34375,
1005
  "learning_rate": 0.00017366039531898326,
1006
+ "loss": 1.0139,
1007
  "step": 685
1008
  },
1009
  {
1010
  "epoch": 4.726027397260274,
1011
+ "grad_norm": 0.349609375,
1012
  "learning_rate": 0.00017311906243222614,
1013
+ "loss": 1.0162,
1014
  "step": 690
1015
  },
1016
  {
1017
  "epoch": 4.760273972602739,
1018
+ "grad_norm": 0.3359375,
1019
  "learning_rate": 0.0001725730855175615,
1020
+ "loss": 1.019,
1021
  "step": 695
1022
  },
1023
  {
1024
  "epoch": 4.794520547945205,
1025
+ "grad_norm": 0.431640625,
1026
  "learning_rate": 0.00017202249925174723,
1027
+ "loss": 1.0051,
1028
  "step": 700
1029
  },
1030
  {
1031
  "epoch": 4.828767123287671,
1032
+ "grad_norm": 0.4140625,
1033
  "learning_rate": 0.00017146733860429612,
1034
+ "loss": 1.0174,
1035
  "step": 705
1036
  },
1037
  {
1038
  "epoch": 4.863013698630137,
1039
+ "grad_norm": 0.408203125,
1040
  "learning_rate": 0.0001709076388352546,
1041
+ "loss": 1.0065,
1042
  "step": 710
1043
  },
1044
  {
1045
  "epoch": 4.897260273972603,
1046
+ "grad_norm": 0.359375,
1047
  "learning_rate": 0.00017034343549296346,
1048
+ "loss": 1.0262,
1049
  "step": 715
1050
  },
1051
  {
1052
  "epoch": 4.931506849315069,
1053
+ "grad_norm": 0.44140625,
1054
  "learning_rate": 0.00016977476441179992,
1055
+ "loss": 1.0023,
1056
  "step": 720
1057
  },
1058
  {
1059
  "epoch": 4.965753424657534,
1060
+ "grad_norm": 0.357421875,
1061
  "learning_rate": 0.0001692016617099018,
1062
+ "loss": 1.0048,
1063
  "step": 725
1064
  },
1065
  {
1066
  "epoch": 5.0,
1067
+ "grad_norm": 0.431640625,
1068
  "learning_rate": 0.0001686241637868734,
1069
+ "loss": 1.0016,
1070
  "step": 730
1071
  },
1072
  {
1073
  "epoch": 5.0,
1074
+ "eval_loss": 2.5294971466064453,
1075
+ "eval_runtime": 0.5501,
1076
+ "eval_samples_per_second": 18.178,
1077
+ "eval_steps_per_second": 1.818,
1078
  "step": 730
1079
  },
1080
  {
1081
  "epoch": 5.034246575342466,
1082
+ "grad_norm": 0.380859375,
1083
  "learning_rate": 0.0001680423073214737,
1084
+ "loss": 0.9822,
1085
  "step": 735
1086
  },
1087
  {
1088
  "epoch": 5.068493150684931,
1089
+ "grad_norm": 0.369140625,
1090
  "learning_rate": 0.00016745612926928694,
1091
+ "loss": 0.9842,
1092
  "step": 740
1093
  },
1094
  {
1095
  "epoch": 5.102739726027397,
1096
+ "grad_norm": 0.38671875,
1097
  "learning_rate": 0.0001668656668603751,
1098
+ "loss": 0.9717,
1099
  "step": 745
1100
  },
1101
  {
1102
  "epoch": 5.136986301369863,
1103
+ "grad_norm": 0.375,
1104
  "learning_rate": 0.00016627095759691362,
1105
+ "loss": 0.9685,
1106
  "step": 750
1107
  },
1108
  {
1109
  "epoch": 5.171232876712328,
1110
+ "grad_norm": 0.353515625,
1111
  "learning_rate": 0.0001656720392508094,
1112
+ "loss": 0.9744,
1113
  "step": 755
1114
  },
1115
  {
1116
  "epoch": 5.205479452054795,
1117
  "grad_norm": 0.376953125,
1118
  "learning_rate": 0.00016506894986130171,
1119
+ "loss": 0.9736,
1120
  "step": 760
1121
  },
1122
  {
1123
  "epoch": 5.239726027397261,
1124
+ "grad_norm": 0.486328125,
1125
  "learning_rate": 0.00016446172773254629,
1126
+ "loss": 0.972,
1127
  "step": 765
1128
  },
1129
  {
1130
  "epoch": 5.273972602739726,
1131
+ "grad_norm": 0.470703125,
1132
  "learning_rate": 0.00016385041143118255,
1133
+ "loss": 0.9813,
1134
  "step": 770
1135
  },
1136
  {
1137
  "epoch": 5.308219178082192,
1138
+ "grad_norm": 0.5546875,
1139
  "learning_rate": 0.000163235039783884,
1140
+ "loss": 0.9855,
1141
  "step": 775
1142
  },
1143
  {
1144
  "epoch": 5.342465753424658,
1145
+ "grad_norm": 0.462890625,
1146
  "learning_rate": 0.0001626156518748922,
1147
+ "loss": 0.9765,
1148
  "step": 780
1149
  },
1150
  {
1151
  "epoch": 5.376712328767123,
1152
+ "grad_norm": 0.59375,
1153
  "learning_rate": 0.00016199228704353455,
1154
+ "loss": 0.9876,
1155
  "step": 785
1156
  },
1157
  {
1158
  "epoch": 5.410958904109589,
1159
+ "grad_norm": 0.53125,
1160
  "learning_rate": 0.00016136498488172568,
1161
+ "loss": 0.9772,
1162
  "step": 790
1163
  },
1164
  {
1165
  "epoch": 5.445205479452055,
1166
+ "grad_norm": 0.3984375,
1167
  "learning_rate": 0.0001607337852314527,
1168
+ "loss": 0.9861,
1169
  "step": 795
1170
  },
1171
  {
1172
  "epoch": 5.47945205479452,
1173
+ "grad_norm": 0.3671875,
1174
  "learning_rate": 0.00016009872818224485,
1175
+ "loss": 0.9879,
1176
  "step": 800
1177
  },
1178
  {
1179
  "epoch": 5.513698630136986,
1180
+ "grad_norm": 0.357421875,
1181
  "learning_rate": 0.00015945985406862721,
1182
+ "loss": 0.9821,
1183
  "step": 805
1184
  },
1185
  {
1186
  "epoch": 5.5479452054794525,
1187
+ "grad_norm": 0.4375,
1188
  "learning_rate": 0.00015881720346755905,
1189
+ "loss": 0.9748,
1190
  "step": 810
1191
  },
1192
  {
1193
  "epoch": 5.582191780821918,
1194
+ "grad_norm": 0.376953125,
1195
  "learning_rate": 0.00015817081719585643,
1196
+ "loss": 0.9726,
1197
  "step": 815
1198
  },
1199
  {
1200
  "epoch": 5.616438356164384,
1201
+ "grad_norm": 0.37890625,
1202
  "learning_rate": 0.00015752073630759998,
1203
+ "loss": 0.9918,
1204
  "step": 820
1205
  },
1206
  {
1207
  "epoch": 5.6506849315068495,
1208
+ "grad_norm": 0.419921875,
1209
  "learning_rate": 0.00015686700209152738,
1210
+ "loss": 0.9775,
1211
  "step": 825
1212
  },
1213
  {
1214
  "epoch": 5.684931506849315,
1215
+ "grad_norm": 0.33203125,
1216
  "learning_rate": 0.00015620965606841098,
1217
+ "loss": 0.9734,
1218
  "step": 830
1219
  },
1220
  {
1221
  "epoch": 5.719178082191781,
1222
+ "grad_norm": 0.37890625,
1223
  "learning_rate": 0.0001555487399884206,
1224
+ "loss": 0.9753,
1225
  "step": 835
1226
  },
1227
  {
1228
  "epoch": 5.7534246575342465,
1229
+ "grad_norm": 0.39453125,
1230
  "learning_rate": 0.00015488429582847192,
1231
+ "loss": 0.9701,
1232
  "step": 840
1233
  },
1234
  {
1235
  "epoch": 5.787671232876712,
1236
+ "grad_norm": 0.357421875,
1237
  "learning_rate": 0.0001542163657895605,
1238
+ "loss": 0.9726,
1239
  "step": 845
1240
  },
1241
  {
1242
  "epoch": 5.821917808219178,
1243
+ "grad_norm": 0.4375,
1244
  "learning_rate": 0.00015354499229408114,
1245
+ "loss": 0.9755,
1246
  "step": 850
1247
  },
1248
  {
1249
  "epoch": 5.8561643835616435,
1250
+ "grad_norm": 0.50390625,
1251
  "learning_rate": 0.0001528702179831338,
1252
+ "loss": 0.9733,
1253
  "step": 855
1254
  },
1255
  {
1256
  "epoch": 5.890410958904109,
1257
+ "grad_norm": 0.419921875,
1258
  "learning_rate": 0.00015219208571381525,
1259
+ "loss": 0.9795,
1260
  "step": 860
1261
  },
1262
  {
1263
  "epoch": 5.924657534246576,
1264
+ "grad_norm": 0.466796875,
1265
  "learning_rate": 0.00015151063855649698,
1266
+ "loss": 0.9906,
1267
  "step": 865
1268
  },
1269
  {
1270
  "epoch": 5.958904109589041,
1271
+ "grad_norm": 0.35546875,
1272
  "learning_rate": 0.00015082591979208976,
1273
+ "loss": 0.983,
1274
  "step": 870
1275
  },
1276
  {
1277
  "epoch": 5.993150684931507,
1278
+ "grad_norm": 0.51953125,
1279
  "learning_rate": 0.00015013797290929466,
1280
+ "loss": 0.9823,
1281
  "step": 875
1282
  },
1283
  {
1284
  "epoch": 6.0,
1285
+ "eval_loss": 2.5500409603118896,
1286
+ "eval_runtime": 0.5455,
1287
+ "eval_samples_per_second": 18.332,
1288
+ "eval_steps_per_second": 1.833,
1289
  "step": 876
1290
  },
1291
  {
1292
  "epoch": 6.027397260273973,
1293
+ "grad_norm": 0.380859375,
1294
  "learning_rate": 0.00014944684160184108,
1295
+ "loss": 0.9588,
1296
  "step": 880
1297
  },
1298
  {
1299
  "epoch": 6.061643835616438,
1300
+ "grad_norm": 0.435546875,
1301
  "learning_rate": 0.00014875256976571135,
1302
+ "loss": 0.9449,
1303
  "step": 885
1304
  },
1305
  {
1306
  "epoch": 6.095890410958904,
1307
+ "grad_norm": 0.41796875,
1308
  "learning_rate": 0.00014805520149635307,
1309
+ "loss": 0.9336,
1310
  "step": 890
1311
  },
1312
  {
1313
  "epoch": 6.13013698630137,
1314
+ "grad_norm": 0.388671875,
1315
  "learning_rate": 0.00014735478108587828,
1316
+ "loss": 0.9428,
1317
  "step": 895
1318
  },
1319
  {
1320
  "epoch": 6.164383561643835,
1321
+ "grad_norm": 0.578125,
1322
  "learning_rate": 0.00014665135302025035,
1323
+ "loss": 0.9457,
1324
  "step": 900
1325
  },
1326
  {
1327
  "epoch": 6.198630136986301,
1328
+ "grad_norm": 0.375,
1329
  "learning_rate": 0.00014594496197645852,
1330
+ "loss": 0.9425,
1331
  "step": 905
1332
  },
1333
  {
1334
  "epoch": 6.232876712328767,
1335
+ "grad_norm": 0.361328125,
1336
  "learning_rate": 0.0001452356528196804,
1337
+ "loss": 0.9492,
1338
  "step": 910
1339
  },
1340
  {
1341
  "epoch": 6.267123287671233,
1342
+ "grad_norm": 0.34375,
1343
  "learning_rate": 0.00014452347060043237,
1344
+ "loss": 0.9542,
1345
  "step": 915
1346
  },
1347
  {
1348
  "epoch": 6.301369863013699,
1349
+ "grad_norm": 0.375,
1350
  "learning_rate": 0.00014380846055170828,
1351
+ "loss": 0.9488,
1352
  "step": 920
1353
  },
1354
  {
1355
  "epoch": 6.335616438356165,
1356
+ "grad_norm": 0.56640625,
1357
  "learning_rate": 0.00014309066808610655,
1358
+ "loss": 0.9532,
1359
  "step": 925
1360
  },
1361
  {
1362
  "epoch": 6.36986301369863,
1363
+ "grad_norm": 0.451171875,
1364
  "learning_rate": 0.0001423701387929459,
1365
+ "loss": 0.954,
1366
  "step": 930
1367
  },
1368
  {
1369
  "epoch": 6.404109589041096,
1370
+ "grad_norm": 0.361328125,
1371
  "learning_rate": 0.00014164691843536982,
1372
+ "loss": 0.9513,
1373
  "step": 935
1374
  },
1375
  {
1376
  "epoch": 6.438356164383562,
1377
+ "grad_norm": 0.4375,
1378
  "learning_rate": 0.00014092105294744,
1379
+ "loss": 0.954,
1380
  "step": 940
1381
  },
1382
  {
1383
  "epoch": 6.472602739726027,
1384
+ "grad_norm": 0.404296875,
1385
  "learning_rate": 0.00014019258843121893,
1386
+ "loss": 0.9549,
1387
  "step": 945
1388
  },
1389
  {
1390
  "epoch": 6.506849315068493,
1391
+ "grad_norm": 0.38671875,
1392
  "learning_rate": 0.0001394615711538417,
1393
+ "loss": 0.9509,
1394
  "step": 950
1395
  },
1396
  {
1397
  "epoch": 6.541095890410959,
1398
+ "grad_norm": 0.376953125,
1399
  "learning_rate": 0.00013872804754457759,
1400
+ "loss": 0.9556,
1401
  "step": 955
1402
  },
1403
  {
1404
  "epoch": 6.575342465753424,
1405
+ "grad_norm": 0.400390625,
1406
  "learning_rate": 0.00013799206419188103,
1407
+ "loss": 0.9596,
1408
  "step": 960
1409
  },
1410
  {
1411
  "epoch": 6.609589041095891,
1412
+ "grad_norm": 0.37890625,
1413
  "learning_rate": 0.00013725366784043288,
1414
+ "loss": 0.9532,
1415
  "step": 965
1416
  },
1417
  {
1418
  "epoch": 6.6438356164383565,
1419
+ "grad_norm": 0.361328125,
1420
  "learning_rate": 0.00013651290538817113,
1421
+ "loss": 0.9547,
1422
  "step": 970
1423
  },
1424
  {
1425
  "epoch": 6.678082191780822,
1426
+ "grad_norm": 0.392578125,
1427
  "learning_rate": 0.0001357698238833126,
1428
+ "loss": 0.9619,
1429
  "step": 975
1430
  },
1431
  {
1432
  "epoch": 6.712328767123288,
1433
+ "grad_norm": 0.38671875,
1434
  "learning_rate": 0.00013502447052136455,
1435
+ "loss": 0.9457,
1436
  "step": 980
1437
  },
1438
  {
1439
  "epoch": 6.7465753424657535,
1440
+ "grad_norm": 0.384765625,
1441
  "learning_rate": 0.00013427689264212738,
1442
+ "loss": 0.9595,
1443
  "step": 985
1444
  },
1445
  {
1446
  "epoch": 6.780821917808219,
1447
+ "grad_norm": 0.3984375,
1448
  "learning_rate": 0.00013352713772668765,
1449
+ "loss": 0.9501,
1450
  "step": 990
1451
  },
1452
  {
1453
  "epoch": 6.815068493150685,
1454
+ "grad_norm": 0.404296875,
1455
  "learning_rate": 0.0001327752533944025,
1456
+ "loss": 0.9542,
1457
  "step": 995
1458
  },
1459
  {
1460
  "epoch": 6.8493150684931505,
1461
+ "grad_norm": 0.5546875,
1462
  "learning_rate": 0.00013202128739987532,
1463
+ "loss": 0.957,
1464
  "step": 1000
1465
  },
1466
  {
1467
  "epoch": 6.883561643835616,
1468
+ "grad_norm": 0.388671875,
1469
  "learning_rate": 0.00013126528762992247,
1470
+ "loss": 0.9597,
1471
  "step": 1005
1472
  },
1473
  {
1474
  "epoch": 6.917808219178082,
1475
+ "grad_norm": 0.4140625,
1476
  "learning_rate": 0.0001305073021005321,
1477
+ "loss": 0.9525,
1478
  "step": 1010
1479
  },
1480
  {
1481
  "epoch": 6.9520547945205475,
1482
+ "grad_norm": 0.400390625,
1483
  "learning_rate": 0.0001297473789538142,
1484
+ "loss": 0.9554,
1485
  "step": 1015
1486
  },
1487
  {
1488
  "epoch": 6.986301369863014,
1489
+ "grad_norm": 0.37890625,
1490
  "learning_rate": 0.00012898556645494325,
1491
+ "loss": 0.955,
1492
  "step": 1020
1493
  },
1494
  {
1495
  "epoch": 7.0,
1496
+ "eval_loss": 2.5866098403930664,
1497
+ "eval_runtime": 0.5603,
1498
+ "eval_samples_per_second": 17.847,
1499
+ "eval_steps_per_second": 1.785,
1500
  "step": 1022
1501
  },
1502
  {
1503
  "epoch": 7.02054794520548,
1504
+ "grad_norm": 0.380859375,
1505
  "learning_rate": 0.0001282219129890925,
1506
+ "loss": 0.9357,
1507
  "step": 1025
1508
  },
1509
  {
1510
  "epoch": 7.054794520547945,
1511
+ "grad_norm": 0.373046875,
1512
  "learning_rate": 0.00012745646705836097,
1513
+ "loss": 0.9228,
1514
  "step": 1030
1515
  },
1516
  {
1517
  "epoch": 7.089041095890411,
1518
+ "grad_norm": 0.5390625,
1519
  "learning_rate": 0.0001266892772786929,
1520
+ "loss": 0.9121,
1521
  "step": 1035
1522
  },
1523
  {
1524
  "epoch": 7.123287671232877,
1525
+ "grad_norm": 0.37109375,
1526
  "learning_rate": 0.0001259203923767901,
1527
+ "loss": 0.9181,
1528
  "step": 1040
1529
  },
1530
  {
1531
  "epoch": 7.157534246575342,
1532
+ "grad_norm": 0.37109375,
1533
  "learning_rate": 0.00012514986118701695,
1534
+ "loss": 0.9176,
1535
  "step": 1045
1536
  },
1537
  {
1538
  "epoch": 7.191780821917808,
1539
+ "grad_norm": 0.3984375,
1540
  "learning_rate": 0.00012437773264829897,
1541
+ "loss": 0.9241,
1542
  "step": 1050
1543
  },
1544
  {
1545
  "epoch": 7.226027397260274,
1546
+ "grad_norm": 0.376953125,
1547
  "learning_rate": 0.00012360405580101448,
1548
+ "loss": 0.9287,
1549
  "step": 1055
1550
  },
1551
  {
1552
  "epoch": 7.260273972602739,
1553
+ "grad_norm": 0.375,
1554
  "learning_rate": 0.00012282887978387976,
1555
+ "loss": 0.9347,
1556
  "step": 1060
1557
  },
1558
  {
1559
  "epoch": 7.294520547945205,
1560
+ "grad_norm": 0.3984375,
1561
  "learning_rate": 0.00012205225383082843,
1562
+ "loss": 0.9275,
1563
  "step": 1065
1564
  },
1565
  {
1566
  "epoch": 7.328767123287671,
1567
+ "grad_norm": 0.404296875,
1568
  "learning_rate": 0.000121274227267884,
1569
+ "loss": 0.923,
1570
  "step": 1070
1571
  },
1572
  {
1573
  "epoch": 7.363013698630137,
1574
+ "grad_norm": 0.388671875,
1575
  "learning_rate": 0.00012049484951002739,
1576
+ "loss": 0.9332,
1577
  "step": 1075
1578
  },
1579
  {
1580
  "epoch": 7.397260273972603,
1581
+ "grad_norm": 0.37890625,
1582
  "learning_rate": 0.00011971417005805818,
1583
+ "loss": 0.9238,
1584
  "step": 1080
1585
  },
1586
  {
1587
  "epoch": 7.431506849315069,
1588
+ "grad_norm": 0.37109375,
1589
  "learning_rate": 0.00011893223849545084,
1590
+ "loss": 0.9278,
1591
  "step": 1085
1592
  },
1593
  {
1594
  "epoch": 7.465753424657534,
1595
+ "grad_norm": 0.388671875,
1596
  "learning_rate": 0.00011814910448520536,
1597
+ "loss": 0.9268,
1598
  "step": 1090
1599
  },
1600
  {
1601
  "epoch": 7.5,
1602
  "grad_norm": 0.404296875,
1603
  "learning_rate": 0.00011736481776669306,
1604
+ "loss": 0.931,
1605
  "step": 1095
1606
  },
1607
  {
1608
  "epoch": 7.534246575342466,
1609
+ "grad_norm": 0.390625,
1610
  "learning_rate": 0.00011657942815249754,
1611
+ "loss": 0.9283,
1612
  "step": 1100
1613
  },
1614
  {
1615
  "epoch": 7.568493150684931,
1616
+ "grad_norm": 0.369140625,
1617
  "learning_rate": 0.00011579298552525084,
1618
+ "loss": 0.9246,
1619
  "step": 1105
1620
  },
1621
  {
1622
  "epoch": 7.602739726027397,
1623
+ "grad_norm": 0.390625,
1624
  "learning_rate": 0.00011500553983446527,
1625
+ "loss": 0.9293,
1626
  "step": 1110
1627
  },
1628
  {
1629
  "epoch": 7.636986301369863,
1630
+ "grad_norm": 0.365234375,
1631
  "learning_rate": 0.00011421714109336097,
1632
+ "loss": 0.9335,
1633
  "step": 1115
1634
  },
1635
  {
1636
  "epoch": 7.671232876712329,
1637
+ "grad_norm": 0.453125,
1638
  "learning_rate": 0.00011342783937568926,
1639
+ "loss": 0.9359,
1640
  "step": 1120
1641
  },
1642
  {
1643
  "epoch": 7.705479452054795,
1644
+ "grad_norm": 0.416015625,
1645
  "learning_rate": 0.00011263768481255264,
1646
+ "loss": 0.9295,
1647
  "step": 1125
1648
  },
1649
  {
1650
  "epoch": 7.739726027397261,
1651
+ "grad_norm": 0.380859375,
1652
  "learning_rate": 0.00011184672758922034,
1653
+ "loss": 0.9404,
1654
  "step": 1130
1655
  },
1656
  {
1657
  "epoch": 7.773972602739726,
1658
+ "grad_norm": 0.396484375,
1659
  "learning_rate": 0.00011105501794194131,
1660
+ "loss": 0.9289,
1661
  "step": 1135
1662
  },
1663
  {
1664
  "epoch": 7.808219178082192,
1665
+ "grad_norm": 0.39453125,
1666
  "learning_rate": 0.00011026260615475333,
1667
+ "loss": 0.9409,
1668
  "step": 1140
1669
  },
1670
  {
1671
  "epoch": 7.842465753424658,
1672
+ "grad_norm": 0.396484375,
1673
  "learning_rate": 0.00010946954255628928,
1674
+ "loss": 0.9355,
1675
  "step": 1145
1676
  },
1677
  {
1678
  "epoch": 7.876712328767123,
1679
+ "grad_norm": 0.443359375,
1680
  "learning_rate": 0.00010867587751658079,
1681
+ "loss": 0.9257,
1682
  "step": 1150
1683
  },
1684
  {
1685
  "epoch": 7.910958904109589,
1686
+ "grad_norm": 0.365234375,
1687
  "learning_rate": 0.00010788166144385888,
1688
+ "loss": 0.924,
1689
  "step": 1155
1690
  },
1691
  {
1692
  "epoch": 7.945205479452055,
1693
+ "grad_norm": 0.427734375,
1694
  "learning_rate": 0.0001070869447813525,
1695
+ "loss": 0.9202,
1696
  "step": 1160
1697
  },
1698
  {
1699
  "epoch": 7.97945205479452,
1700
+ "grad_norm": 0.3515625,
1701
  "learning_rate": 0.0001062917780040847,
1702
+ "loss": 0.9214,
1703
  "step": 1165
1704
  },
1705
  {
1706
  "epoch": 8.0,
1707
+ "eval_loss": 2.6224260330200195,
1708
+ "eval_runtime": 0.5566,
1709
+ "eval_samples_per_second": 17.965,
1710
+ "eval_steps_per_second": 1.797,
1711
  "step": 1168
1712
  },
1713
  {
1714
  "epoch": 8.013698630136986,
1715
+ "grad_norm": 0.388671875,
1716
  "learning_rate": 0.0001054962116156667,
1717
+ "loss": 0.9133,
1718
  "step": 1170
1719
  },
1720
  {
1721
  "epoch": 8.047945205479452,
1722
  "grad_norm": 0.41796875,
1723
  "learning_rate": 0.00010470029614509041,
1724
+ "loss": 0.8952,
1725
  "step": 1175
1726
  },
1727
  {
1728
  "epoch": 8.082191780821917,
1729
+ "grad_norm": 0.3984375,
1730
  "learning_rate": 0.00010390408214351892,
1731
+ "loss": 0.8963,
1732
  "step": 1180
1733
  },
1734
  {
1735
  "epoch": 8.116438356164384,
1736
+ "grad_norm": 0.388671875,
1737
  "learning_rate": 0.0001031076201810762,
1738
+ "loss": 0.8996,
1739
  "step": 1185
1740
  },
1741
  {
1742
  "epoch": 8.150684931506849,
1743
+ "grad_norm": 0.38671875,
1744
  "learning_rate": 0.00010231096084363483,
1745
+ "loss": 0.8898,
1746
  "step": 1190
1747
  },
1748
  {
1749
  "epoch": 8.184931506849315,
1750
+ "grad_norm": 0.390625,
1751
  "learning_rate": 0.00010151415472960342,
1752
+ "loss": 0.9138,
1753
  "step": 1195
1754
  },
1755
  {
1756
  "epoch": 8.219178082191782,
1757
+ "grad_norm": 0.388671875,
1758
  "learning_rate": 0.00010071725244671282,
1759
+ "loss": 0.9023,
1760
  "step": 1200
1761
  },
1762
  {
1763
  "epoch": 8.253424657534246,
1764
+ "grad_norm": 0.388671875,
1765
  "learning_rate": 9.992030460880181e-05,
1766
+ "loss": 0.8929,
1767
  "step": 1205
1768
  },
1769
  {
1770
  "epoch": 8.287671232876713,
1771
+ "grad_norm": 0.392578125,
1772
  "learning_rate": 9.91233618326026e-05,
1773
+ "loss": 0.9089,
1774
  "step": 1210
1775
  },
1776
  {
1777
  "epoch": 8.321917808219178,
1778
+ "grad_norm": 0.41015625,
1779
  "learning_rate": 9.83264747345259e-05,
1780
+ "loss": 0.9037,
1781
  "step": 1215
1782
  },
1783
  {
1784
  "epoch": 8.356164383561644,
1785
+ "grad_norm": 0.369140625,
1786
  "learning_rate": 9.752969392744606e-05,
1787
+ "loss": 0.9062,
1788
  "step": 1220
1789
  },
1790
  {
1791
  "epoch": 8.39041095890411,
1792
+ "grad_norm": 0.40234375,
1793
  "learning_rate": 9.673307001748661e-05,
1794
+ "loss": 0.8982,
1795
  "step": 1225
1796
  },
1797
  {
1798
  "epoch": 8.424657534246576,
1799
+ "grad_norm": 0.392578125,
1800
  "learning_rate": 9.593665360080599e-05,
1801
+ "loss": 0.8994,
1802
  "step": 1230
1803
  },
1804
  {
1805
  "epoch": 8.45890410958904,
1806
+ "grad_norm": 0.4140625,
1807
  "learning_rate": 9.514049526038418e-05,
1808
+ "loss": 0.9045,
1809
  "step": 1235
1810
  },
1811
  {
1812
  "epoch": 8.493150684931507,
1813
+ "grad_norm": 0.400390625,
1814
  "learning_rate": 9.43446455628097e-05,
1815
+ "loss": 0.9062,
1816
  "step": 1240
1817
  },
1818
  {
1819
  "epoch": 8.527397260273972,
1820
+ "grad_norm": 0.427734375,
1821
  "learning_rate": 9.354915505506839e-05,
1822
+ "loss": 0.9056,
1823
  "step": 1245
1824
  },
1825
  {
1826
  "epoch": 8.561643835616438,
1827
+ "grad_norm": 0.3828125,
1828
  "learning_rate": 9.27540742613326e-05,
1829
+ "loss": 0.9078,
1830
  "step": 1250
1831
  },
1832
  {
1833
  "epoch": 8.595890410958905,
1834
+ "grad_norm": 0.431640625,
1835
  "learning_rate": 9.195945367975256e-05,
1836
+ "loss": 0.8994,
1837
  "step": 1255
1838
  },
1839
  {
1840
  "epoch": 8.63013698630137,
1841
+ "grad_norm": 0.404296875,
1842
  "learning_rate": 9.116534377924883e-05,
1843
+ "loss": 0.9088,
1844
  "step": 1260
1845
  },
1846
  {
1847
  "epoch": 8.664383561643836,
1848
+ "grad_norm": 0.44921875,
1849
  "learning_rate": 9.037179499630703e-05,
1850
+ "loss": 0.9035,
1851
  "step": 1265
1852
  },
1853
  {
1854
  "epoch": 8.698630136986301,
1855
+ "grad_norm": 0.40625,
1856
  "learning_rate": 8.957885773177438e-05,
1857
+ "loss": 0.9178,
1858
  "step": 1270
1859
  },
1860
  {
1861
  "epoch": 8.732876712328768,
1862
+ "grad_norm": 0.51953125,
1863
  "learning_rate": 8.878658234765858e-05,
1864
+ "loss": 0.9062,
1865
  "step": 1275
1866
  },
1867
  {
1868
  "epoch": 8.767123287671232,
1869
+ "grad_norm": 0.486328125,
1870
  "learning_rate": 8.799501916392912e-05,
1871
+ "loss": 0.9157,
1872
  "step": 1280
1873
  },
1874
  {
1875
  "epoch": 8.801369863013699,
1876
+ "grad_norm": 0.392578125,
1877
  "learning_rate": 8.720421845532151e-05,
1878
+ "loss": 0.912,
1879
  "step": 1285
1880
  },
1881
  {
1882
  "epoch": 8.835616438356164,
1883
+ "grad_norm": 0.37109375,
1884
  "learning_rate": 8.641423044814374e-05,
1885
+ "loss": 0.9085,
1886
  "step": 1290
1887
  },
1888
  {
1889
  "epoch": 8.86986301369863,
1890
+ "grad_norm": 0.396484375,
1891
  "learning_rate": 8.562510531708677e-05,
1892
+ "loss": 0.9158,
1893
  "step": 1295
1894
  },
1895
  {
1896
  "epoch": 8.904109589041095,
1897
+ "grad_norm": 0.384765625,
1898
  "learning_rate": 8.48368931820373e-05,
1899
+ "loss": 0.909,
1900
  "step": 1300
1901
  },
1902
  {
1903
  "epoch": 8.938356164383562,
1904
+ "grad_norm": 0.39453125,
1905
  "learning_rate": 8.404964410489485e-05,
1906
+ "loss": 0.9121,
1907
  "step": 1305
1908
  },
1909
  {
1910
  "epoch": 8.972602739726028,
1911
+ "grad_norm": 0.39453125,
1912
  "learning_rate": 8.32634080863919e-05,
1913
+ "loss": 0.913,
1914
  "step": 1310
1915
  },
1916
  {
1917
  "epoch": 9.0,
1918
+ "eval_loss": 2.6512458324432373,
1919
+ "eval_runtime": 0.5534,
1920
+ "eval_samples_per_second": 18.07,
1921
+ "eval_steps_per_second": 1.807,
1922
  "step": 1314
1923
  },
1924
  {
1925
  "epoch": 9.006849315068493,
1926
+ "grad_norm": 0.408203125,
1927
  "learning_rate": 8.247823506291844e-05,
1928
+ "loss": 0.9034,
1929
  "step": 1315
1930
  },
1931
  {
1932
  "epoch": 9.04109589041096,
1933
+ "grad_norm": 0.404296875,
1934
  "learning_rate": 8.169417490335007e-05,
1935
+ "loss": 0.8821,
1936
  "step": 1320
1937
  },
1938
  {
1939
  "epoch": 9.075342465753424,
1940
+ "grad_norm": 0.416015625,
1941
  "learning_rate": 8.091127740588094e-05,
1942
+ "loss": 0.8702,
1943
  "step": 1325
1944
  },
1945
  {
1946
  "epoch": 9.10958904109589,
1947
+ "grad_norm": 0.39453125,
1948
  "learning_rate": 8.012959229486061e-05,
1949
+ "loss": 0.8755,
1950
  "step": 1330
1951
  },
1952
  {
1953
  "epoch": 9.143835616438356,
1954
+ "grad_norm": 0.43359375,
1955
  "learning_rate": 7.934916921763628e-05,
1956
+ "loss": 0.8783,
1957
  "step": 1335
1958
  },
1959
  {
1960
  "epoch": 9.178082191780822,
1961
+ "grad_norm": 0.421875,
1962
  "learning_rate": 7.857005774139907e-05,
1963
+ "loss": 0.8794,
1964
  "step": 1340
1965
  },
1966
  {
1967
  "epoch": 9.212328767123287,
1968
+ "grad_norm": 0.400390625,
1969
  "learning_rate": 7.779230735003628e-05,
1970
+ "loss": 0.8844,
1971
  "step": 1345
1972
  },
1973
  {
1974
  "epoch": 9.246575342465754,
1975
+ "grad_norm": 0.3984375,
1976
  "learning_rate": 7.701596744098818e-05,
1977
+ "loss": 0.8775,
1978
  "step": 1350
1979
  },
1980
  {
1981
  "epoch": 9.280821917808218,
1982
+ "grad_norm": 0.404296875,
1983
  "learning_rate": 7.624108732211081e-05,
1984
+ "loss": 0.8705,
1985
  "step": 1355
1986
  },
1987
  {
1988
  "epoch": 9.315068493150685,
1989
+ "grad_norm": 0.408203125,
1990
  "learning_rate": 7.54677162085442e-05,
1991
+ "loss": 0.8897,
1992
  "step": 1360
1993
  },
1994
  {
1995
  "epoch": 9.349315068493151,
1996
+ "grad_norm": 0.3984375,
1997
  "learning_rate": 7.469590321958662e-05,
1998
+ "loss": 0.882,
1999
  "step": 1365
2000
  },
2001
  {
2002
  "epoch": 9.383561643835616,
2003
  "grad_norm": 0.43359375,
2004
  "learning_rate": 7.392569737557474e-05,
2005
+ "loss": 0.8879,
2006
  "step": 1370
2007
  },
2008
  {
2009
  "epoch": 9.417808219178083,
2010
+ "grad_norm": 0.416015625,
2011
  "learning_rate": 7.31571475947703e-05,
2012
+ "loss": 0.8827,
2013
  "step": 1375
2014
  },
2015
  {
2016
  "epoch": 9.452054794520548,
2017
+ "grad_norm": 0.412109375,
2018
  "learning_rate": 7.239030269025311e-05,
2019
+ "loss": 0.8805,
2020
  "step": 1380
2021
  },
2022
  {
2023
  "epoch": 9.486301369863014,
2024
+ "grad_norm": 0.408203125,
2025
  "learning_rate": 7.162521136682085e-05,
2026
+ "loss": 0.8966,
2027
  "step": 1385
2028
  },
2029
  {
2030
  "epoch": 9.520547945205479,
2031
+ "grad_norm": 0.431640625,
2032
  "learning_rate": 7.08619222178954e-05,
2033
+ "loss": 0.8895,
2034
  "step": 1390
2035
  },
2036
  {
2037
  "epoch": 9.554794520547945,
2038
+ "grad_norm": 0.423828125,
2039
  "learning_rate": 7.010048372243698e-05,
2040
+ "loss": 0.8907,
2041
  "step": 1395
2042
  },
2043
  {
2044
  "epoch": 9.58904109589041,
2045
+ "grad_norm": 0.42578125,
2046
  "learning_rate": 6.934094424186459e-05,
2047
+ "loss": 0.8876,
2048
  "step": 1400
2049
  },
2050
  {
2051
  "epoch": 9.623287671232877,
2052
+ "grad_norm": 0.39453125,
2053
  "learning_rate": 6.858335201698485e-05,
2054
+ "loss": 0.8936,
2055
  "step": 1405
2056
  },
2057
  {
2058
  "epoch": 9.657534246575342,
2059
+ "grad_norm": 0.451171875,
2060
  "learning_rate": 6.782775516492771e-05,
2061
+ "loss": 0.8804,
2062
  "step": 1410
2063
  },
2064
  {
2065
  "epoch": 9.691780821917808,
2066
+ "grad_norm": 0.40234375,
2067
  "learning_rate": 6.70742016760907e-05,
2068
+ "loss": 0.8907,
2069
  "step": 1415
2070
  },
2071
  {
2072
  "epoch": 9.726027397260275,
2073
+ "grad_norm": 0.4453125,
2074
  "learning_rate": 6.632273941109064e-05,
2075
+ "loss": 0.8756,
2076
  "step": 1420
2077
  },
2078
  {
2079
  "epoch": 9.76027397260274,
2080
+ "grad_norm": 0.40625,
2081
  "learning_rate": 6.5573416097724e-05,
2082
+ "loss": 0.8963,
2083
  "step": 1425
2084
  },
2085
  {
2086
  "epoch": 9.794520547945206,
2087
+ "grad_norm": 0.412109375,
2088
  "learning_rate": 6.482627932793553e-05,
2089
+ "loss": 0.8998,
2090
  "step": 1430
2091
  },
2092
  {
2093
  "epoch": 9.82876712328767,
2094
+ "grad_norm": 0.419921875,
2095
  "learning_rate": 6.408137655479554e-05,
2096
+ "loss": 0.9024,
2097
  "step": 1435
2098
  },
2099
  {
2100
  "epoch": 9.863013698630137,
2101
+ "grad_norm": 0.421875,
2102
  "learning_rate": 6.333875508948593e-05,
2103
+ "loss": 0.8921,
2104
  "step": 1440
2105
  },
2106
  {
2107
  "epoch": 9.897260273972602,
2108
+ "grad_norm": 0.45703125,
2109
  "learning_rate": 6.259846209829551e-05,
2110
+ "loss": 0.904,
2111
  "step": 1445
2112
  },
2113
  {
2114
  "epoch": 9.931506849315069,
2115
+ "grad_norm": 0.4140625,
2116
  "learning_rate": 6.186054459962399e-05,
2117
+ "loss": 0.8899,
2118
  "step": 1450
2119
  },
2120
  {
2121
  "epoch": 9.965753424657533,
2122
+ "grad_norm": 0.40625,
2123
  "learning_rate": 6.112504946099604e-05,
2124
+ "loss": 0.8875,
2125
  "step": 1455
2126
  },
2127
  {
2128
  "epoch": 10.0,
2129
+ "grad_norm": 0.431640625,
2130
  "learning_rate": 6.039202339608432e-05,
2131
+ "loss": 0.889,
2132
  "step": 1460
2133
  },
2134
  {
2135
  "epoch": 10.0,
2136
+ "eval_loss": 2.6852145195007324,
2137
+ "eval_runtime": 0.5511,
2138
+ "eval_samples_per_second": 18.146,
2139
+ "eval_steps_per_second": 1.815,
2140
  "step": 1460
2141
  },
2142
  {
2143
  "epoch": 10.034246575342467,
2144
+ "grad_norm": 0.40625,
2145
  "learning_rate": 5.966151296174268e-05,
2146
+ "loss": 0.8664,
2147
  "step": 1465
2148
  },
2149
  {
2150
  "epoch": 10.068493150684931,
2151
+ "grad_norm": 0.431640625,
2152
  "learning_rate": 5.8933564555049105e-05,
2153
+ "loss": 0.8677,
2154
  "step": 1470
2155
  },
2156
  {
2157
  "epoch": 10.102739726027398,
2158
+ "grad_norm": 0.41796875,
2159
  "learning_rate": 5.820822441035899e-05,
2160
+ "loss": 0.866,
2161
  "step": 1475
2162
  },
2163
  {
2164
  "epoch": 10.136986301369863,
2165
+ "grad_norm": 0.40625,
2166
  "learning_rate": 5.7485538596368496e-05,
2167
+ "loss": 0.8664,
2168
  "step": 1480
2169
  },
2170
  {
2171
  "epoch": 10.17123287671233,
2172
  "grad_norm": 0.41015625,
2173
  "learning_rate": 5.6765553013188766e-05,
2174
+ "loss": 0.8645,
2175
  "step": 1485
2176
  },
2177
  {
2178
  "epoch": 10.205479452054794,
2179
+ "grad_norm": 0.400390625,
2180
  "learning_rate": 5.6048313389430484e-05,
2181
+ "loss": 0.8624,
2182
  "step": 1490
2183
  },
2184
  {
2185
  "epoch": 10.23972602739726,
2186
+ "grad_norm": 0.408203125,
2187
  "learning_rate": 5.533386527929962e-05,
2188
+ "loss": 0.874,
2189
  "step": 1495
2190
  },
2191
  {
2192
  "epoch": 10.273972602739725,
2193
+ "grad_norm": 0.40625,
2194
  "learning_rate": 5.462225405970401e-05,
2195
+ "loss": 0.8708,
2196
  "step": 1500
2197
  },
2198
  {
2199
+ "epoch": 10.273972602739725,
2200
+ "step": 1500,
2201
+ "total_flos": 8.853977907740017e+17,
2202
+ "train_loss": 0.0,
2203
+ "train_runtime": 2.8738,
2204
+ "train_samples_per_second": 24365.255,
2205
+ "train_steps_per_second": 508.044
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2206
  }
2207
  ],
2208
  "logging_steps": 5,
2209
+ "max_steps": 1460,
2210
  "num_input_tokens_seen": 0,
2211
+ "num_train_epochs": 10,
2212
  "save_steps": 100,
2213
  "stateful_callbacks": {
2214
  "TrainerControl": {
 
2222
  "attributes": {}
2223
  }
2224
  },
2225
+ "total_flos": 8.853977907740017e+17,
2226
  "train_batch_size": 8,
2227
  "trial_name": null,
2228
  "trial_params": null
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9701d374006488c3c752f343e751a9352e46b8cc32754ddb0a8fe6f15b6bcfc7
3
  size 5304
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ae681e8a9654a4e133111dfcf66660b6b957d55bd252fdde7dac3732a4ad91c9
3
  size 5304