ZhangShenao commited on
Commit
62bd1a3
1 Parent(s): 90d8786

Model save

Browse files
Files changed (5) hide show
  1. README.md +37 -37
  2. all_results.json +7 -7
  3. generation_config.json +1 -1
  4. train_results.json +7 -7
  5. trainer_state.json +521 -2488
README.md CHANGED
@@ -1,57 +1,57 @@
1
  ---
2
- license: gemma
3
  base_model: google/gemma-2-2b-it
 
 
4
  tags:
 
5
  - trl
6
  - sft
7
- - generated_from_trainer
8
- model-index:
9
- - name: gemma-2-2b-it-sft-m
10
- results: []
11
  ---
12
 
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
15
-
16
- # gemma-2-2b-it-sft-m
17
-
18
- This model is a fine-tuned version of [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) on an unknown dataset.
19
-
20
- ## Model description
21
 
22
- More information needed
 
23
 
24
- ## Intended uses & limitations
25
 
26
- More information needed
 
27
 
28
- ## Training and evaluation data
29
-
30
- More information needed
 
 
31
 
32
  ## Training procedure
33
 
34
- ### Training hyperparameters
 
 
35
 
36
- The following hyperparameters were used during training:
37
- - learning_rate: 2e-05
38
- - train_batch_size: 8
39
- - eval_batch_size: 8
40
- - seed: 42
41
- - gradient_accumulation_steps: 4
42
- - total_train_batch_size: 32
43
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
44
- - lr_scheduler_type: cosine
45
- - lr_scheduler_warmup_ratio: 0.1
46
- - num_epochs: 6
47
 
48
- ### Training results
 
 
 
 
49
 
 
50
 
51
 
52
- ### Framework versions
53
 
54
- - Transformers 4.43.4
55
- - Pytorch 2.4.1+cu121
56
- - Datasets 3.0.1
57
- - Tokenizers 0.19.1
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  base_model: google/gemma-2-2b-it
3
+ library_name: transformers
4
+ model_name: gemma-2-2b-it-sft-m
5
  tags:
6
+ - generated_from_trainer
7
  - trl
8
  - sft
9
+ licence: license
 
 
 
10
  ---
11
 
12
+ # Model Card for gemma-2-2b-it-sft-m
 
 
 
 
 
 
 
13
 
14
+ This model is a fine-tuned version of [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it).
15
+ It has been trained using [TRL](https://github.com/huggingface/trl).
16
 
17
+ ## Quick start
18
 
19
+ ```python
20
+ from transformers import pipeline
21
 
22
+ question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
+ generator = pipeline("text-generation", model="ZhangShenao/gemma-2-2b-it-sft-m", device="cuda")
24
+ output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
+ print(output["generated_text"])
26
+ ```
27
 
28
  ## Training procedure
29
 
30
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/yutongyin/huggingface/runs/7oikn82h)
31
+
32
+ This model was trained with SFT.
33
 
34
+ ### Framework versions
 
 
 
 
 
 
 
 
 
 
35
 
36
+ - TRL: 0.12.0
37
+ - Transformers: 4.46.1
38
+ - Pytorch: 2.4.0
39
+ - Datasets: 3.0.2
40
+ - Tokenizers: 0.20.1
41
 
42
+ ## Citations
43
 
44
 
 
45
 
46
+ Cite TRL as:
47
+
48
+ ```bibtex
49
+ @misc{vonwerra2022trl,
50
+ title = {{TRL: Transformer Reinforcement Learning}},
51
+ author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
52
+ year = 2020,
53
+ journal = {GitHub repository},
54
+ publisher = {GitHub},
55
+ howpublished = {\url{https://github.com/huggingface/trl}}
56
+ }
57
+ ```
all_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
- "epoch": 5.991215226939971,
3
- "total_flos": 2.3380471586755123e+17,
4
- "train_loss": 0.343939907065905,
5
- "train_runtime": 7450.6058,
6
- "train_samples": 10921,
7
- "train_samples_per_second": 8.795,
8
- "train_steps_per_second": 0.275
9
  }
 
1
  {
2
+ "epoch": 5.958236658932715,
3
+ "total_flos": 6.526533872789914e+16,
4
+ "train_loss": 0.40725265422435564,
5
+ "train_runtime": 2315.0788,
6
+ "train_samples": 3448,
7
+ "train_samples_per_second": 8.936,
8
+ "train_steps_per_second": 0.277
9
  }
generation_config.json CHANGED
@@ -7,5 +7,5 @@
7
  107
8
  ],
9
  "pad_token_id": 0,
10
- "transformers_version": "4.43.4"
11
  }
 
7
  107
8
  ],
9
  "pad_token_id": 0,
10
+ "transformers_version": "4.46.1"
11
  }
train_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
- "epoch": 5.991215226939971,
3
- "total_flos": 2.3380471586755123e+17,
4
- "train_loss": 0.343939907065905,
5
- "train_runtime": 7450.6058,
6
- "train_samples": 10921,
7
- "train_samples_per_second": 8.795,
8
- "train_steps_per_second": 0.275
9
  }
 
1
  {
2
+ "epoch": 5.958236658932715,
3
+ "total_flos": 6.526533872789914e+16,
4
+ "train_loss": 0.40725265422435564,
5
+ "train_runtime": 2315.0788,
6
+ "train_samples": 3448,
7
+ "train_samples_per_second": 8.936,
8
+ "train_steps_per_second": 0.277
9
  }
trainer_state.json CHANGED
@@ -1,2888 +1,921 @@
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
- "epoch": 5.991215226939971,
5
  "eval_steps": 500,
6
- "global_step": 2046,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
- "epoch": 0.014641288433382138,
13
- "grad_norm": 386.0,
14
- "learning_rate": 4.878048780487805e-07,
15
- "loss": 2.4491,
16
  "step": 5
17
  },
18
  {
19
- "epoch": 0.029282576866764276,
20
- "grad_norm": 320.0,
21
- "learning_rate": 9.75609756097561e-07,
22
- "loss": 2.4222,
23
  "step": 10
24
  },
25
  {
26
- "epoch": 0.043923865300146414,
27
- "grad_norm": 153.0,
28
- "learning_rate": 1.4634146341463414e-06,
29
- "loss": 2.4019,
30
  "step": 15
31
  },
32
  {
33
- "epoch": 0.05856515373352855,
34
- "grad_norm": 134.0,
35
- "learning_rate": 1.951219512195122e-06,
36
- "loss": 2.2314,
37
  "step": 20
38
  },
39
  {
40
- "epoch": 0.07320644216691069,
41
- "grad_norm": 266.0,
42
- "learning_rate": 2.4390243902439027e-06,
43
- "loss": 1.9584,
44
  "step": 25
45
  },
46
  {
47
- "epoch": 0.08784773060029283,
48
- "grad_norm": 208.0,
49
- "learning_rate": 2.926829268292683e-06,
50
- "loss": 1.6917,
51
  "step": 30
52
  },
53
  {
54
- "epoch": 0.10248901903367497,
55
- "grad_norm": 81.5,
56
- "learning_rate": 3.414634146341464e-06,
57
- "loss": 1.5312,
58
  "step": 35
59
  },
60
  {
61
- "epoch": 0.1171303074670571,
62
- "grad_norm": 6.78125,
63
- "learning_rate": 3.902439024390244e-06,
64
- "loss": 1.3735,
65
  "step": 40
66
  },
67
  {
68
- "epoch": 0.13177159590043924,
69
- "grad_norm": 4.96875,
70
- "learning_rate": 4.390243902439025e-06,
71
- "loss": 1.1638,
72
  "step": 45
73
  },
74
  {
75
- "epoch": 0.14641288433382138,
76
- "grad_norm": 4.78125,
77
- "learning_rate": 4.8780487804878055e-06,
78
- "loss": 1.0906,
79
  "step": 50
80
  },
81
  {
82
- "epoch": 0.16105417276720352,
83
- "grad_norm": 4.4375,
84
- "learning_rate": 5.365853658536586e-06,
85
- "loss": 1.0331,
86
  "step": 55
87
  },
88
  {
89
- "epoch": 0.17569546120058566,
90
- "grad_norm": 4.03125,
91
- "learning_rate": 5.853658536585366e-06,
92
- "loss": 1.0135,
93
  "step": 60
94
  },
95
  {
96
- "epoch": 0.1903367496339678,
97
- "grad_norm": 3.84375,
98
- "learning_rate": 6.341463414634147e-06,
99
- "loss": 0.9753,
100
  "step": 65
101
  },
102
  {
103
- "epoch": 0.20497803806734993,
104
- "grad_norm": 3.6875,
105
- "learning_rate": 6.829268292682928e-06,
106
- "loss": 0.9603,
107
  "step": 70
108
  },
109
  {
110
- "epoch": 0.21961932650073207,
111
- "grad_norm": 3.75,
112
- "learning_rate": 7.317073170731707e-06,
113
- "loss": 0.9181,
114
  "step": 75
115
  },
116
  {
117
- "epoch": 0.2342606149341142,
118
- "grad_norm": 3.5,
119
- "learning_rate": 7.804878048780489e-06,
120
- "loss": 0.9099,
121
  "step": 80
122
  },
123
  {
124
- "epoch": 0.24890190336749635,
125
- "grad_norm": 3.96875,
126
- "learning_rate": 8.292682926829268e-06,
127
- "loss": 0.9165,
128
  "step": 85
129
  },
130
  {
131
- "epoch": 0.2635431918008785,
132
- "grad_norm": 3.390625,
133
- "learning_rate": 8.78048780487805e-06,
134
- "loss": 0.8916,
135
  "step": 90
136
  },
137
  {
138
- "epoch": 0.2781844802342606,
139
- "grad_norm": 3.546875,
140
- "learning_rate": 9.268292682926831e-06,
141
- "loss": 0.8939,
142
  "step": 95
143
  },
144
  {
145
- "epoch": 0.29282576866764276,
146
- "grad_norm": 3.484375,
147
- "learning_rate": 9.756097560975611e-06,
148
- "loss": 0.8707,
149
  "step": 100
150
  },
151
  {
152
- "epoch": 0.3074670571010249,
153
- "grad_norm": 3.53125,
154
- "learning_rate": 1.024390243902439e-05,
155
- "loss": 0.8536,
156
  "step": 105
157
  },
158
  {
159
- "epoch": 0.32210834553440704,
160
- "grad_norm": 3.484375,
161
- "learning_rate": 1.0731707317073172e-05,
162
- "loss": 0.828,
163
  "step": 110
164
  },
165
  {
166
- "epoch": 0.3367496339677892,
167
- "grad_norm": 3.796875,
168
- "learning_rate": 1.1219512195121953e-05,
169
- "loss": 0.8076,
170
  "step": 115
171
  },
172
  {
173
- "epoch": 0.3513909224011713,
174
- "grad_norm": 3.34375,
175
- "learning_rate": 1.1707317073170731e-05,
176
- "loss": 0.8242,
177
  "step": 120
178
  },
179
  {
180
- "epoch": 0.36603221083455345,
181
- "grad_norm": 3.359375,
182
- "learning_rate": 1.2195121951219513e-05,
183
- "loss": 0.8349,
184
  "step": 125
185
  },
186
  {
187
- "epoch": 0.3806734992679356,
188
- "grad_norm": 3.34375,
189
- "learning_rate": 1.2682926829268294e-05,
190
- "loss": 0.8274,
191
  "step": 130
192
  },
193
  {
194
- "epoch": 0.3953147877013177,
195
- "grad_norm": 2.96875,
196
- "learning_rate": 1.3170731707317076e-05,
197
- "loss": 0.7836,
198
  "step": 135
199
  },
200
  {
201
- "epoch": 0.40995607613469986,
202
- "grad_norm": 3.125,
203
- "learning_rate": 1.3658536585365855e-05,
204
- "loss": 0.8193,
205
  "step": 140
206
  },
207
  {
208
- "epoch": 0.424597364568082,
209
- "grad_norm": 3.484375,
210
- "learning_rate": 1.4146341463414635e-05,
211
- "loss": 0.8256,
212
  "step": 145
213
  },
214
  {
215
- "epoch": 0.43923865300146414,
216
- "grad_norm": 2.890625,
217
- "learning_rate": 1.4634146341463415e-05,
218
- "loss": 0.8126,
219
  "step": 150
220
  },
221
  {
222
- "epoch": 0.4538799414348463,
223
- "grad_norm": 3.203125,
224
- "learning_rate": 1.5121951219512196e-05,
225
- "loss": 0.79,
226
  "step": 155
227
  },
228
  {
229
- "epoch": 0.4685212298682284,
230
- "grad_norm": 3.09375,
231
- "learning_rate": 1.5609756097560978e-05,
232
- "loss": 0.7964,
233
  "step": 160
234
  },
235
  {
236
- "epoch": 0.48316251830161056,
237
- "grad_norm": 3.296875,
238
- "learning_rate": 1.6097560975609757e-05,
239
- "loss": 0.8435,
240
  "step": 165
241
  },
242
  {
243
- "epoch": 0.4978038067349927,
244
- "grad_norm": 3.203125,
245
- "learning_rate": 1.6585365853658537e-05,
246
- "loss": 0.8321,
247
  "step": 170
248
  },
249
  {
250
- "epoch": 0.5124450951683748,
251
- "grad_norm": 3.671875,
252
- "learning_rate": 1.7073170731707317e-05,
253
- "loss": 0.816,
254
  "step": 175
255
  },
256
  {
257
- "epoch": 0.527086383601757,
258
- "grad_norm": 3.453125,
259
- "learning_rate": 1.75609756097561e-05,
260
- "loss": 0.8269,
261
  "step": 180
262
  },
263
  {
264
- "epoch": 0.541727672035139,
265
- "grad_norm": 3.0,
266
- "learning_rate": 1.804878048780488e-05,
267
- "loss": 0.8218,
268
  "step": 185
269
  },
270
  {
271
- "epoch": 0.5563689604685212,
272
  "grad_norm": 3.3125,
273
- "learning_rate": 1.8536585365853663e-05,
274
- "loss": 0.7842,
275
  "step": 190
276
  },
277
  {
278
- "epoch": 0.5710102489019033,
279
- "grad_norm": 2.96875,
280
- "learning_rate": 1.902439024390244e-05,
281
- "loss": 0.796,
282
  "step": 195
283
  },
284
  {
285
- "epoch": 0.5856515373352855,
286
- "grad_norm": 3.09375,
287
- "learning_rate": 1.9512195121951222e-05,
288
- "loss": 0.8302,
289
  "step": 200
290
  },
291
  {
292
- "epoch": 0.6002928257686676,
293
- "grad_norm": 2.921875,
294
- "learning_rate": 2e-05,
295
- "loss": 0.7881,
296
  "step": 205
297
  },
298
  {
299
- "epoch": 0.6149341142020498,
300
- "grad_norm": 2.78125,
301
- "learning_rate": 1.9999636001539654e-05,
302
- "loss": 0.7963,
303
  "step": 210
304
  },
305
  {
306
- "epoch": 0.6295754026354319,
307
- "grad_norm": 2.953125,
308
- "learning_rate": 1.999854403265758e-05,
309
- "loss": 0.8286,
310
  "step": 215
311
  },
312
  {
313
- "epoch": 0.6442166910688141,
314
- "grad_norm": 2.890625,
315
- "learning_rate": 1.9996724172848786e-05,
316
- "loss": 0.8367,
317
  "step": 220
318
  },
319
  {
320
- "epoch": 0.6588579795021962,
321
- "grad_norm": 2.953125,
322
- "learning_rate": 1.99941765545985e-05,
323
- "loss": 0.8145,
324
  "step": 225
325
  },
326
  {
327
- "epoch": 0.6734992679355783,
328
- "grad_norm": 2.75,
329
- "learning_rate": 1.9990901363372548e-05,
330
- "loss": 0.7916,
331
  "step": 230
332
  },
333
  {
334
- "epoch": 0.6881405563689604,
335
- "grad_norm": 2.796875,
336
- "learning_rate": 1.9986898837603842e-05,
337
- "loss": 0.8106,
338
  "step": 235
339
  },
340
  {
341
- "epoch": 0.7027818448023426,
342
- "grad_norm": 2.828125,
343
- "learning_rate": 1.9982169268675024e-05,
344
- "loss": 0.7791,
345
  "step": 240
346
  },
347
  {
348
- "epoch": 0.7174231332357247,
349
- "grad_norm": 2.859375,
350
- "learning_rate": 1.9976713000897262e-05,
351
- "loss": 0.8273,
352
  "step": 245
353
  },
354
  {
355
- "epoch": 0.7320644216691069,
356
- "grad_norm": 2.796875,
357
- "learning_rate": 1.9970530431485163e-05,
358
- "loss": 0.8084,
359
  "step": 250
360
  },
361
  {
362
- "epoch": 0.746705710102489,
363
- "grad_norm": 2.625,
364
- "learning_rate": 1.9963622010527877e-05,
365
- "loss": 0.7724,
366
  "step": 255
367
  },
368
  {
369
- "epoch": 0.7613469985358712,
370
- "grad_norm": 2.640625,
371
- "learning_rate": 1.9955988240956327e-05,
372
- "loss": 0.7838,
373
  "step": 260
374
  },
375
  {
376
- "epoch": 0.7759882869692533,
377
- "grad_norm": 3.015625,
378
- "learning_rate": 1.9947629678506586e-05,
379
- "loss": 0.8028,
380
  "step": 265
381
  },
382
  {
383
- "epoch": 0.7906295754026355,
384
- "grad_norm": 2.84375,
385
- "learning_rate": 1.993854693167942e-05,
386
- "loss": 0.7766,
387
  "step": 270
388
  },
389
  {
390
- "epoch": 0.8052708638360175,
391
- "grad_norm": 2.90625,
392
- "learning_rate": 1.992874066169601e-05,
393
- "loss": 0.8095,
394
  "step": 275
395
  },
396
  {
397
- "epoch": 0.8199121522693997,
398
- "grad_norm": 3.109375,
399
- "learning_rate": 1.991821158244979e-05,
400
- "loss": 0.773,
401
  "step": 280
402
  },
403
  {
404
- "epoch": 0.8345534407027818,
405
- "grad_norm": 2.6875,
406
- "learning_rate": 1.990696046045448e-05,
407
- "loss": 0.7843,
408
  "step": 285
409
  },
410
  {
411
- "epoch": 0.849194729136164,
412
- "grad_norm": 2.84375,
413
- "learning_rate": 1.98949881147883e-05,
414
- "loss": 0.8031,
415
  "step": 290
416
  },
417
  {
418
- "epoch": 0.8638360175695461,
419
- "grad_norm": 2.609375,
420
- "learning_rate": 1.9882295417034334e-05,
421
- "loss": 0.7855,
422
  "step": 295
423
  },
424
  {
425
- "epoch": 0.8784773060029283,
426
- "grad_norm": 2.75,
427
- "learning_rate": 1.986888329121706e-05,
428
- "loss": 0.7705,
429
  "step": 300
430
  },
431
  {
432
- "epoch": 0.8931185944363104,
433
- "grad_norm": 2.90625,
434
- "learning_rate": 1.9854752713735115e-05,
435
- "loss": 0.8004,
436
  "step": 305
437
  },
438
  {
439
- "epoch": 0.9077598828696926,
440
- "grad_norm": 2.984375,
441
- "learning_rate": 1.9839904713290186e-05,
442
- "loss": 0.7705,
443
  "step": 310
444
  },
445
  {
446
- "epoch": 0.9224011713030746,
447
- "grad_norm": 2.84375,
448
- "learning_rate": 1.982434037081213e-05,
449
- "loss": 0.759,
450
  "step": 315
451
  },
452
  {
453
- "epoch": 0.9370424597364568,
454
- "grad_norm": 2.71875,
455
- "learning_rate": 1.980806081938029e-05,
456
- "loss": 0.7424,
457
  "step": 320
458
  },
459
  {
460
- "epoch": 0.9516837481698389,
461
- "grad_norm": 2.8125,
462
- "learning_rate": 1.9791067244141e-05,
463
- "loss": 0.7511,
464
  "step": 325
465
  },
466
  {
467
- "epoch": 0.9663250366032211,
468
- "grad_norm": 3.015625,
469
- "learning_rate": 1.97733608822213e-05,
470
- "loss": 0.7854,
471
  "step": 330
472
  },
473
  {
474
- "epoch": 0.9809663250366032,
475
- "grad_norm": 2.59375,
476
- "learning_rate": 1.975494302263889e-05,
477
- "loss": 0.7465,
478
  "step": 335
479
  },
480
  {
481
- "epoch": 0.9956076134699854,
482
- "grad_norm": 2.59375,
483
- "learning_rate": 1.973581500620827e-05,
484
- "loss": 0.7334,
485
  "step": 340
486
  },
487
  {
488
- "epoch": 1.0102489019033676,
489
- "grad_norm": 2.828125,
490
- "learning_rate": 1.9715978225443147e-05,
491
- "loss": 0.6681,
492
  "step": 345
493
  },
494
  {
495
- "epoch": 1.0248901903367496,
496
- "grad_norm": 2.953125,
497
- "learning_rate": 1.969543412445506e-05,
498
- "loss": 0.5843,
499
  "step": 350
500
  },
501
  {
502
- "epoch": 1.0395314787701317,
503
- "grad_norm": 2.953125,
504
- "learning_rate": 1.9674184198848227e-05,
505
- "loss": 0.5876,
506
  "step": 355
507
  },
508
  {
509
- "epoch": 1.054172767203514,
510
- "grad_norm": 3.125,
511
- "learning_rate": 1.965222999561069e-05,
512
- "loss": 0.5753,
513
  "step": 360
514
  },
515
  {
516
- "epoch": 1.0688140556368961,
517
- "grad_norm": 2.96875,
518
- "learning_rate": 1.9629573113001685e-05,
519
- "loss": 0.5774,
520
  "step": 365
521
  },
522
  {
523
- "epoch": 1.083455344070278,
524
- "grad_norm": 3.359375,
525
- "learning_rate": 1.960621520043529e-05,
526
- "loss": 0.5742,
527
  "step": 370
528
  },
529
  {
530
- "epoch": 1.0980966325036603,
531
- "grad_norm": 3.34375,
532
- "learning_rate": 1.9582157958360347e-05,
533
- "loss": 0.586,
534
  "step": 375
535
  },
536
  {
537
- "epoch": 1.1127379209370425,
538
- "grad_norm": 3.390625,
539
- "learning_rate": 1.9557403138136672e-05,
540
- "loss": 0.5973,
541
  "step": 380
542
  },
543
  {
544
- "epoch": 1.1273792093704247,
545
- "grad_norm": 3.171875,
546
- "learning_rate": 1.9531952541907553e-05,
547
- "loss": 0.5753,
548
  "step": 385
549
  },
550
  {
551
- "epoch": 1.1420204978038067,
552
- "grad_norm": 3.28125,
553
- "learning_rate": 1.9505808022468556e-05,
554
- "loss": 0.5725,
555
  "step": 390
556
  },
557
  {
558
- "epoch": 1.1566617862371888,
559
- "grad_norm": 3.3125,
560
- "learning_rate": 1.9478971483132657e-05,
561
- "loss": 0.5681,
562
  "step": 395
563
  },
564
  {
565
- "epoch": 1.171303074670571,
566
- "grad_norm": 3.09375,
567
- "learning_rate": 1.9451444877591638e-05,
568
- "loss": 0.5648,
569
  "step": 400
570
  },
571
  {
572
- "epoch": 1.1859443631039532,
573
  "grad_norm": 3.78125,
574
- "learning_rate": 1.942323020977392e-05,
575
- "loss": 0.5605,
576
  "step": 405
577
  },
578
  {
579
- "epoch": 1.2005856515373352,
580
- "grad_norm": 3.046875,
581
- "learning_rate": 1.9394329533698625e-05,
582
- "loss": 0.5667,
583
  "step": 410
584
  },
585
  {
586
- "epoch": 1.2152269399707174,
587
- "grad_norm": 3.28125,
588
- "learning_rate": 1.9364744953326077e-05,
589
- "loss": 0.5603,
590
  "step": 415
591
  },
592
  {
593
- "epoch": 1.2298682284040996,
594
- "grad_norm": 3.515625,
595
- "learning_rate": 1.933447862240461e-05,
596
- "loss": 0.5882,
597
  "step": 420
598
  },
599
  {
600
- "epoch": 1.2445095168374818,
601
- "grad_norm": 2.96875,
602
- "learning_rate": 1.9303532744313804e-05,
603
- "loss": 0.5981,
604
  "step": 425
605
  },
606
  {
607
- "epoch": 1.2591508052708638,
608
- "grad_norm": 3.375,
609
- "learning_rate": 1.9271909571904048e-05,
610
- "loss": 0.5889,
611
  "step": 430
612
  },
613
  {
614
- "epoch": 1.273792093704246,
615
- "grad_norm": 3.3125,
616
- "learning_rate": 1.9239611407332562e-05,
617
- "loss": 0.5999,
618
  "step": 435
619
  },
620
  {
621
- "epoch": 1.2884333821376281,
622
- "grad_norm": 2.890625,
623
- "learning_rate": 1.920664060189577e-05,
624
- "loss": 0.5523,
625
  "step": 440
626
  },
627
  {
628
- "epoch": 1.3030746705710103,
629
- "grad_norm": 3.828125,
630
- "learning_rate": 1.9172999555858167e-05,
631
- "loss": 0.5457,
632
  "step": 445
633
  },
634
  {
635
- "epoch": 1.3177159590043923,
636
- "grad_norm": 3.140625,
637
- "learning_rate": 1.9138690718277542e-05,
638
- "loss": 0.615,
639
  "step": 450
640
  },
641
  {
642
- "epoch": 1.3323572474377745,
643
- "grad_norm": 3.421875,
644
- "learning_rate": 1.9103716586826705e-05,
645
- "loss": 0.5743,
646
  "step": 455
647
  },
648
  {
649
- "epoch": 1.3469985358711567,
650
- "grad_norm": 3.3125,
651
- "learning_rate": 1.9068079707611653e-05,
652
- "loss": 0.5638,
653
  "step": 460
654
  },
655
  {
656
- "epoch": 1.3616398243045387,
657
- "grad_norm": 3.390625,
658
- "learning_rate": 1.9031782674986224e-05,
659
- "loss": 0.5822,
660
  "step": 465
661
  },
662
  {
663
- "epoch": 1.3762811127379209,
664
- "grad_norm": 3.296875,
665
- "learning_rate": 1.8994828131363216e-05,
666
- "loss": 0.5712,
667
  "step": 470
668
  },
669
  {
670
- "epoch": 1.390922401171303,
671
- "grad_norm": 3.90625,
672
- "learning_rate": 1.895721876702202e-05,
673
- "loss": 0.5913,
674
  "step": 475
675
  },
676
  {
677
- "epoch": 1.4055636896046853,
678
- "grad_norm": 3.15625,
679
- "learning_rate": 1.8918957319912783e-05,
680
- "loss": 0.5423,
681
  "step": 480
682
  },
683
  {
684
- "epoch": 1.4202049780380674,
685
- "grad_norm": 3.21875,
686
- "learning_rate": 1.8880046575457072e-05,
687
- "loss": 0.556,
688
  "step": 485
689
  },
690
  {
691
- "epoch": 1.4348462664714494,
692
- "grad_norm": 3.515625,
693
- "learning_rate": 1.8840489366345108e-05,
694
- "loss": 0.5614,
695
  "step": 490
696
  },
697
  {
698
- "epoch": 1.4494875549048316,
699
- "grad_norm": 3.15625,
700
- "learning_rate": 1.880028857232952e-05,
701
- "loss": 0.5605,
702
  "step": 495
703
  },
704
  {
705
- "epoch": 1.4641288433382138,
706
- "grad_norm": 3.8125,
707
- "learning_rate": 1.8759447120015747e-05,
708
- "loss": 0.5422,
709
  "step": 500
710
  },
711
  {
712
- "epoch": 1.4787701317715958,
713
- "grad_norm": 3.484375,
714
- "learning_rate": 1.8717967982648932e-05,
715
- "loss": 0.5633,
716
  "step": 505
717
  },
718
  {
719
- "epoch": 1.493411420204978,
720
- "grad_norm": 3.296875,
721
- "learning_rate": 1.8675854179897505e-05,
722
- "loss": 0.5528,
723
  "step": 510
724
  },
725
  {
726
- "epoch": 1.5080527086383602,
727
- "grad_norm": 3.65625,
728
- "learning_rate": 1.863310877763334e-05,
729
- "loss": 0.5439,
730
  "step": 515
731
  },
732
  {
733
- "epoch": 1.5226939970717424,
734
- "grad_norm": 3.328125,
735
- "learning_rate": 1.8589734887708556e-05,
736
- "loss": 0.5317,
737
  "step": 520
738
  },
739
  {
740
- "epoch": 1.5373352855051245,
741
- "grad_norm": 3.21875,
742
- "learning_rate": 1.8545735667728988e-05,
743
- "loss": 0.5614,
744
  "step": 525
745
  },
746
  {
747
- "epoch": 1.5519765739385067,
748
- "grad_norm": 3.1875,
749
- "learning_rate": 1.85011143208243e-05,
750
- "loss": 0.5364,
751
  "step": 530
752
  },
753
  {
754
- "epoch": 1.5666178623718887,
755
- "grad_norm": 3.21875,
756
- "learning_rate": 1.8455874095414802e-05,
757
- "loss": 0.5336,
758
  "step": 535
759
  },
760
  {
761
- "epoch": 1.581259150805271,
762
- "grad_norm": 3.40625,
763
- "learning_rate": 1.8410018284974976e-05,
764
- "loss": 0.5601,
765
  "step": 540
766
  },
767
  {
768
- "epoch": 1.5959004392386529,
769
- "grad_norm": 3.1875,
770
- "learning_rate": 1.8363550227793707e-05,
771
- "loss": 0.5469,
772
  "step": 545
773
  },
774
  {
775
- "epoch": 1.610541727672035,
776
- "grad_norm": 3.53125,
777
- "learning_rate": 1.8316473306731243e-05,
778
- "loss": 0.576,
779
  "step": 550
780
  },
781
  {
782
- "epoch": 1.6251830161054173,
783
- "grad_norm": 3.140625,
784
- "learning_rate": 1.8268790948972942e-05,
785
- "loss": 0.5526,
786
  "step": 555
787
  },
788
  {
789
- "epoch": 1.6398243045387995,
790
- "grad_norm": 3.96875,
791
- "learning_rate": 1.822050662577976e-05,
792
- "loss": 0.5443,
793
  "step": 560
794
  },
795
  {
796
- "epoch": 1.6544655929721817,
797
- "grad_norm": 3.21875,
798
- "learning_rate": 1.817162385223557e-05,
799
- "loss": 0.5207,
800
  "step": 565
801
  },
802
  {
803
- "epoch": 1.6691068814055638,
804
- "grad_norm": 3.765625,
805
- "learning_rate": 1.8122146186991224e-05,
806
- "loss": 0.5183,
807
  "step": 570
808
  },
809
  {
810
- "epoch": 1.6837481698389458,
811
- "grad_norm": 3.484375,
812
- "learning_rate": 1.807207723200552e-05,
813
- "loss": 0.5613,
814
  "step": 575
815
  },
816
  {
817
- "epoch": 1.698389458272328,
818
- "grad_norm": 3.59375,
819
- "learning_rate": 1.802142063228296e-05,
820
- "loss": 0.5463,
821
  "step": 580
822
  },
823
  {
824
- "epoch": 1.71303074670571,
825
- "grad_norm": 3.28125,
826
- "learning_rate": 1.797018007560841e-05,
827
- "loss": 0.5256,
828
  "step": 585
829
  },
830
  {
831
- "epoch": 1.7276720351390922,
832
- "grad_norm": 3.15625,
833
- "learning_rate": 1.7918359292278614e-05,
834
- "loss": 0.5519,
835
  "step": 590
836
  },
837
  {
838
- "epoch": 1.7423133235724744,
839
- "grad_norm": 4.03125,
840
- "learning_rate": 1.7865962054830642e-05,
841
- "loss": 0.5327,
842
  "step": 595
843
  },
844
  {
845
- "epoch": 1.7569546120058566,
846
- "grad_norm": 3.25,
847
- "learning_rate": 1.7812992177767244e-05,
848
- "loss": 0.5738,
849
  "step": 600
850
  },
851
  {
852
- "epoch": 1.7715959004392388,
853
- "grad_norm": 3.296875,
854
- "learning_rate": 1.7759453517279158e-05,
855
- "loss": 0.5539,
856
  "step": 605
857
  },
858
  {
859
- "epoch": 1.786237188872621,
860
- "grad_norm": 3.515625,
861
- "learning_rate": 1.770534997096439e-05,
862
- "loss": 0.5491,
863
  "step": 610
864
  },
865
  {
866
- "epoch": 1.800878477306003,
867
- "grad_norm": 3.53125,
868
- "learning_rate": 1.7650685477544442e-05,
869
- "loss": 0.5324,
870
  "step": 615
871
  },
872
  {
873
- "epoch": 1.8155197657393851,
874
- "grad_norm": 3.203125,
875
- "learning_rate": 1.7595464016577606e-05,
876
- "loss": 0.5314,
877
  "step": 620
878
  },
879
  {
880
- "epoch": 1.830161054172767,
881
- "grad_norm": 3.484375,
882
- "learning_rate": 1.753968960816924e-05,
883
- "loss": 0.5166,
884
  "step": 625
885
  },
886
  {
887
- "epoch": 1.8448023426061493,
888
- "grad_norm": 3.53125,
889
- "learning_rate": 1.748336631267909e-05,
890
- "loss": 0.5021,
891
  "step": 630
892
  },
893
  {
894
- "epoch": 1.8594436310395315,
895
- "grad_norm": 3.546875,
896
- "learning_rate": 1.7426498230425736e-05,
897
- "loss": 0.52,
898
  "step": 635
899
  },
900
  {
901
- "epoch": 1.8740849194729137,
902
- "grad_norm": 3.265625,
903
- "learning_rate": 1.7369089501388047e-05,
904
- "loss": 0.513,
905
  "step": 640
906
  },
907
  {
908
- "epoch": 1.8887262079062959,
909
- "grad_norm": 3.453125,
910
- "learning_rate": 1.7311144304903826e-05,
911
- "loss": 0.5652,
912
- "step": 645
913
- },
914
- {
915
- "epoch": 1.903367496339678,
916
- "grad_norm": 3.4375,
917
- "learning_rate": 1.7252666859365525e-05,
918
- "loss": 0.5579,
919
- "step": 650
920
- },
921
- {
922
- "epoch": 1.91800878477306,
923
- "grad_norm": 4.09375,
924
- "learning_rate": 1.719366142191318e-05,
925
- "loss": 0.5111,
926
- "step": 655
927
- },
928
- {
929
- "epoch": 1.9326500732064422,
930
- "grad_norm": 3.8125,
931
- "learning_rate": 1.7134132288124464e-05,
932
- "loss": 0.5121,
933
- "step": 660
934
- },
935
- {
936
- "epoch": 1.9472913616398242,
937
- "grad_norm": 3.5,
938
- "learning_rate": 1.707408379170199e-05,
939
- "loss": 0.4807,
940
- "step": 665
941
- },
942
- {
943
- "epoch": 1.9619326500732064,
944
- "grad_norm": 3.8125,
945
- "learning_rate": 1.7013520304157802e-05,
946
- "loss": 0.5231,
947
- "step": 670
948
- },
949
- {
950
- "epoch": 1.9765739385065886,
951
- "grad_norm": 4.125,
952
- "learning_rate": 1.6952446234495145e-05,
953
- "loss": 0.5067,
954
- "step": 675
955
- },
956
- {
957
- "epoch": 1.9912152269399708,
958
- "grad_norm": 3.984375,
959
- "learning_rate": 1.6890866028887484e-05,
960
- "loss": 0.5486,
961
- "step": 680
962
- },
963
- {
964
- "epoch": 2.005856515373353,
965
- "grad_norm": 3.390625,
966
- "learning_rate": 1.682878417035483e-05,
967
- "loss": 0.4237,
968
- "step": 685
969
- },
970
- {
971
- "epoch": 2.020497803806735,
972
- "grad_norm": 4.5625,
973
- "learning_rate": 1.676620517843736e-05,
974
- "loss": 0.2785,
975
- "step": 690
976
- },
977
- {
978
- "epoch": 2.035139092240117,
979
- "grad_norm": 3.703125,
980
- "learning_rate": 1.6703133608866415e-05,
981
- "loss": 0.2647,
982
- "step": 695
983
- },
984
- {
985
- "epoch": 2.049780380673499,
986
- "grad_norm": 3.375,
987
- "learning_rate": 1.6639574053232844e-05,
988
- "loss": 0.2726,
989
- "step": 700
990
- },
991
- {
992
- "epoch": 2.0644216691068813,
993
- "grad_norm": 3.859375,
994
- "learning_rate": 1.6575531138652726e-05,
995
- "loss": 0.2848,
996
- "step": 705
997
- },
998
- {
999
- "epoch": 2.0790629575402635,
1000
- "grad_norm": 3.65625,
1001
- "learning_rate": 1.6511009527430516e-05,
1002
- "loss": 0.2691,
1003
- "step": 710
1004
- },
1005
- {
1006
- "epoch": 2.0937042459736457,
1007
- "grad_norm": 3.640625,
1008
- "learning_rate": 1.6446013916719646e-05,
1009
- "loss": 0.2744,
1010
- "step": 715
1011
- },
1012
- {
1013
- "epoch": 2.108345534407028,
1014
- "grad_norm": 3.78125,
1015
- "learning_rate": 1.638054903818056e-05,
1016
- "loss": 0.2755,
1017
- "step": 720
1018
- },
1019
- {
1020
- "epoch": 2.12298682284041,
1021
- "grad_norm": 3.78125,
1022
- "learning_rate": 1.6314619657636258e-05,
1023
- "loss": 0.2661,
1024
- "step": 725
1025
- },
1026
- {
1027
- "epoch": 2.1376281112737923,
1028
- "grad_norm": 4.46875,
1029
- "learning_rate": 1.624823057472534e-05,
1030
- "loss": 0.2548,
1031
- "step": 730
1032
- },
1033
- {
1034
- "epoch": 2.1522693997071745,
1035
- "grad_norm": 4.28125,
1036
- "learning_rate": 1.6181386622552607e-05,
1037
- "loss": 0.2666,
1038
- "step": 735
1039
- },
1040
- {
1041
- "epoch": 2.166910688140556,
1042
- "grad_norm": 3.921875,
1043
- "learning_rate": 1.611409266733718e-05,
1044
- "loss": 0.2717,
1045
- "step": 740
1046
- },
1047
- {
1048
- "epoch": 2.1815519765739384,
1049
- "grad_norm": 4.3125,
1050
- "learning_rate": 1.604635360805829e-05,
1051
- "loss": 0.2657,
1052
- "step": 745
1053
- },
1054
- {
1055
- "epoch": 2.1961932650073206,
1056
- "grad_norm": 4.46875,
1057
- "learning_rate": 1.5978174376098588e-05,
1058
- "loss": 0.2823,
1059
- "step": 750
1060
- },
1061
- {
1062
- "epoch": 2.210834553440703,
1063
- "grad_norm": 3.8125,
1064
- "learning_rate": 1.590955993488516e-05,
1065
- "loss": 0.2748,
1066
- "step": 755
1067
- },
1068
- {
1069
- "epoch": 2.225475841874085,
1070
- "grad_norm": 3.765625,
1071
- "learning_rate": 1.584051527952821e-05,
1072
- "loss": 0.2524,
1073
- "step": 760
1074
- },
1075
- {
1076
- "epoch": 2.240117130307467,
1077
- "grad_norm": 3.890625,
1078
- "learning_rate": 1.577104543645738e-05,
1079
- "loss": 0.2782,
1080
- "step": 765
1081
- },
1082
- {
1083
- "epoch": 2.2547584187408494,
1084
- "grad_norm": 4.125,
1085
- "learning_rate": 1.5701155463055858e-05,
1086
- "loss": 0.2663,
1087
- "step": 770
1088
- },
1089
- {
1090
- "epoch": 2.269399707174231,
1091
- "grad_norm": 4.03125,
1092
- "learning_rate": 1.563085044729218e-05,
1093
- "loss": 0.2968,
1094
- "step": 775
1095
- },
1096
- {
1097
- "epoch": 2.2840409956076133,
1098
- "grad_norm": 4.28125,
1099
- "learning_rate": 1.5560135507349848e-05,
1100
- "loss": 0.2751,
1101
- "step": 780
1102
- },
1103
- {
1104
- "epoch": 2.2986822840409955,
1105
- "grad_norm": 4.03125,
1106
- "learning_rate": 1.5489015791254714e-05,
1107
- "loss": 0.2626,
1108
- "step": 785
1109
- },
1110
- {
1111
- "epoch": 2.3133235724743777,
1112
- "grad_norm": 4.90625,
1113
- "learning_rate": 1.5417496476500212e-05,
1114
- "loss": 0.2656,
1115
- "step": 790
1116
- },
1117
- {
1118
- "epoch": 2.32796486090776,
1119
- "grad_norm": 4.375,
1120
- "learning_rate": 1.5345582769670428e-05,
1121
- "loss": 0.2692,
1122
- "step": 795
1123
- },
1124
- {
1125
- "epoch": 2.342606149341142,
1126
- "grad_norm": 3.703125,
1127
- "learning_rate": 1.5273279906061082e-05,
1128
- "loss": 0.2497,
1129
- "step": 800
1130
- },
1131
- {
1132
- "epoch": 2.3572474377745243,
1133
- "grad_norm": 4.125,
1134
- "learning_rate": 1.5200593149298375e-05,
1135
- "loss": 0.2608,
1136
- "step": 805
1137
- },
1138
- {
1139
- "epoch": 2.3718887262079065,
1140
- "grad_norm": 3.625,
1141
- "learning_rate": 1.512752779095582e-05,
1142
- "loss": 0.2757,
1143
- "step": 810
1144
- },
1145
- {
1146
- "epoch": 2.3865300146412887,
1147
- "grad_norm": 4.03125,
1148
- "learning_rate": 1.5054089150169003e-05,
1149
- "loss": 0.2609,
1150
- "step": 815
1151
- },
1152
- {
1153
- "epoch": 2.4011713030746704,
1154
- "grad_norm": 3.765625,
1155
- "learning_rate": 1.498028257324836e-05,
1156
- "loss": 0.2484,
1157
- "step": 820
1158
- },
1159
- {
1160
- "epoch": 2.4158125915080526,
1161
- "grad_norm": 3.671875,
1162
- "learning_rate": 1.4906113433289963e-05,
1163
- "loss": 0.2557,
1164
- "step": 825
1165
- },
1166
- {
1167
- "epoch": 2.430453879941435,
1168
- "grad_norm": 4.375,
1169
- "learning_rate": 1.4831587129784363e-05,
1170
- "loss": 0.2729,
1171
- "step": 830
1172
- },
1173
- {
1174
- "epoch": 2.445095168374817,
1175
- "grad_norm": 3.875,
1176
- "learning_rate": 1.4756709088223508e-05,
1177
- "loss": 0.2686,
1178
- "step": 835
1179
- },
1180
- {
1181
- "epoch": 2.459736456808199,
1182
- "grad_norm": 3.84375,
1183
- "learning_rate": 1.4681484759705764e-05,
1184
- "loss": 0.269,
1185
- "step": 840
1186
- },
1187
- {
1188
- "epoch": 2.4743777452415814,
1189
- "grad_norm": 4.03125,
1190
- "learning_rate": 1.4605919620539082e-05,
1191
- "loss": 0.2565,
1192
- "step": 845
1193
- },
1194
- {
1195
- "epoch": 2.4890190336749636,
1196
- "grad_norm": 4.375,
1197
- "learning_rate": 1.453001917184233e-05,
1198
- "loss": 0.2632,
1199
- "step": 850
1200
- },
1201
- {
1202
- "epoch": 2.5036603221083453,
1203
- "grad_norm": 3.78125,
1204
- "learning_rate": 1.4453788939144793e-05,
1205
- "loss": 0.2623,
1206
- "step": 855
1207
- },
1208
- {
1209
- "epoch": 2.5183016105417275,
1210
- "grad_norm": 4.1875,
1211
- "learning_rate": 1.4377234471983944e-05,
1212
- "loss": 0.2606,
1213
- "step": 860
1214
- },
1215
- {
1216
- "epoch": 2.5329428989751097,
1217
- "grad_norm": 4.40625,
1218
- "learning_rate": 1.430036134350142e-05,
1219
- "loss": 0.2519,
1220
- "step": 865
1221
- },
1222
- {
1223
- "epoch": 2.547584187408492,
1224
- "grad_norm": 4.15625,
1225
- "learning_rate": 1.4223175150037297e-05,
1226
- "loss": 0.2638,
1227
- "step": 870
1228
- },
1229
- {
1230
- "epoch": 2.562225475841874,
1231
- "grad_norm": 3.9375,
1232
- "learning_rate": 1.4145681510722694e-05,
1233
- "loss": 0.2616,
1234
- "step": 875
1235
- },
1236
- {
1237
- "epoch": 2.5768667642752563,
1238
- "grad_norm": 4.34375,
1239
- "learning_rate": 1.406788606707069e-05,
1240
- "loss": 0.2691,
1241
- "step": 880
1242
- },
1243
- {
1244
- "epoch": 2.5915080527086385,
1245
- "grad_norm": 3.71875,
1246
- "learning_rate": 1.398979448256563e-05,
1247
- "loss": 0.2695,
1248
- "step": 885
1249
- },
1250
- {
1251
- "epoch": 2.6061493411420207,
1252
- "grad_norm": 3.890625,
1253
- "learning_rate": 1.3911412442250818e-05,
1254
- "loss": 0.2566,
1255
- "step": 890
1256
- },
1257
- {
1258
- "epoch": 2.620790629575403,
1259
- "grad_norm": 4.03125,
1260
- "learning_rate": 1.3832745652314652e-05,
1261
- "loss": 0.2865,
1262
- "step": 895
1263
- },
1264
- {
1265
- "epoch": 2.6354319180087846,
1266
- "grad_norm": 4.21875,
1267
- "learning_rate": 1.3753799839675215e-05,
1268
- "loss": 0.2768,
1269
- "step": 900
1270
- },
1271
- {
1272
- "epoch": 2.650073206442167,
1273
- "grad_norm": 3.421875,
1274
- "learning_rate": 1.3674580751563357e-05,
1275
- "loss": 0.2611,
1276
- "step": 905
1277
- },
1278
- {
1279
- "epoch": 2.664714494875549,
1280
- "grad_norm": 4.5625,
1281
- "learning_rate": 1.3595094155104297e-05,
1282
- "loss": 0.2618,
1283
- "step": 910
1284
- },
1285
- {
1286
- "epoch": 2.679355783308931,
1287
- "grad_norm": 4.125,
1288
- "learning_rate": 1.3515345836897789e-05,
1289
- "loss": 0.2693,
1290
- "step": 915
1291
- },
1292
- {
1293
- "epoch": 2.6939970717423134,
1294
- "grad_norm": 4.3125,
1295
- "learning_rate": 1.3435341602596834e-05,
1296
- "loss": 0.2566,
1297
- "step": 920
1298
- },
1299
- {
1300
- "epoch": 2.7086383601756956,
1301
- "grad_norm": 3.953125,
1302
- "learning_rate": 1.3355087276485055e-05,
1303
- "loss": 0.2821,
1304
- "step": 925
1305
- },
1306
- {
1307
- "epoch": 2.7232796486090773,
1308
- "grad_norm": 3.859375,
1309
- "learning_rate": 1.3274588701052679e-05,
1310
- "loss": 0.2684,
1311
- "step": 930
1312
- },
1313
- {
1314
- "epoch": 2.7379209370424595,
1315
- "grad_norm": 4.46875,
1316
- "learning_rate": 1.3193851736571213e-05,
1317
- "loss": 0.2677,
1318
- "step": 935
1319
- },
1320
- {
1321
- "epoch": 2.7525622254758417,
1322
- "grad_norm": 3.984375,
1323
- "learning_rate": 1.3112882260666805e-05,
1324
- "loss": 0.2551,
1325
- "step": 940
1326
- },
1327
- {
1328
- "epoch": 2.767203513909224,
1329
- "grad_norm": 3.734375,
1330
- "learning_rate": 1.3031686167892375e-05,
1331
- "loss": 0.2522,
1332
- "step": 945
1333
- },
1334
- {
1335
- "epoch": 2.781844802342606,
1336
- "grad_norm": 3.984375,
1337
- "learning_rate": 1.2950269369298468e-05,
1338
- "loss": 0.2594,
1339
- "step": 950
1340
- },
1341
- {
1342
- "epoch": 2.7964860907759883,
1343
- "grad_norm": 4.34375,
1344
- "learning_rate": 1.2868637792002952e-05,
1345
- "loss": 0.2512,
1346
- "step": 955
1347
- },
1348
- {
1349
- "epoch": 2.8111273792093705,
1350
- "grad_norm": 4.09375,
1351
- "learning_rate": 1.278679737875952e-05,
1352
- "loss": 0.2649,
1353
- "step": 960
1354
- },
1355
- {
1356
- "epoch": 2.8257686676427527,
1357
- "grad_norm": 4.0,
1358
- "learning_rate": 1.2704754087525051e-05,
1359
- "loss": 0.2579,
1360
- "step": 965
1361
- },
1362
- {
1363
- "epoch": 2.840409956076135,
1364
- "grad_norm": 4.3125,
1365
- "learning_rate": 1.2622513891025889e-05,
1366
- "loss": 0.2583,
1367
- "step": 970
1368
- },
1369
- {
1370
- "epoch": 2.855051244509517,
1371
- "grad_norm": 4.46875,
1372
- "learning_rate": 1.2540082776323009e-05,
1373
- "loss": 0.2682,
1374
- "step": 975
1375
- },
1376
- {
1377
- "epoch": 2.869692532942899,
1378
- "grad_norm": 4.125,
1379
- "learning_rate": 1.2457466744376184e-05,
1380
- "loss": 0.2644,
1381
- "step": 980
1382
- },
1383
- {
1384
- "epoch": 2.884333821376281,
1385
- "grad_norm": 4.125,
1386
- "learning_rate": 1.237467180960709e-05,
1387
- "loss": 0.2643,
1388
- "step": 985
1389
- },
1390
- {
1391
- "epoch": 2.898975109809663,
1392
- "grad_norm": 4.5,
1393
- "learning_rate": 1.2291703999461498e-05,
1394
- "loss": 0.26,
1395
- "step": 990
1396
- },
1397
- {
1398
- "epoch": 2.9136163982430454,
1399
- "grad_norm": 3.921875,
1400
- "learning_rate": 1.2208569353970422e-05,
1401
- "loss": 0.2484,
1402
- "step": 995
1403
- },
1404
- {
1405
- "epoch": 2.9282576866764276,
1406
- "grad_norm": 4.125,
1407
- "learning_rate": 1.2125273925310465e-05,
1408
- "loss": 0.2638,
1409
- "step": 1000
1410
- },
1411
- {
1412
- "epoch": 2.94289897510981,
1413
- "grad_norm": 3.4375,
1414
- "learning_rate": 1.2041823777363185e-05,
1415
- "loss": 0.2515,
1416
- "step": 1005
1417
- },
1418
- {
1419
- "epoch": 2.9575402635431915,
1420
- "grad_norm": 4.21875,
1421
- "learning_rate": 1.1958224985273648e-05,
1422
- "loss": 0.2432,
1423
- "step": 1010
1424
- },
1425
- {
1426
- "epoch": 2.9721815519765737,
1427
- "grad_norm": 4.34375,
1428
- "learning_rate": 1.1874483635008183e-05,
1429
- "loss": 0.2532,
1430
- "step": 1015
1431
- },
1432
- {
1433
- "epoch": 2.986822840409956,
1434
- "grad_norm": 4.03125,
1435
- "learning_rate": 1.1790605822911294e-05,
1436
- "loss": 0.2567,
1437
- "step": 1020
1438
- },
1439
- {
1440
- "epoch": 3.001464128843338,
1441
- "grad_norm": 4.71875,
1442
- "learning_rate": 1.1706597655261883e-05,
1443
- "loss": 0.2133,
1444
- "step": 1025
1445
- },
1446
- {
1447
- "epoch": 3.0161054172767203,
1448
- "grad_norm": 2.578125,
1449
- "learning_rate": 1.1622465247828681e-05,
1450
- "loss": 0.1427,
1451
- "step": 1030
1452
- },
1453
- {
1454
- "epoch": 3.0307467057101025,
1455
- "grad_norm": 5.125,
1456
- "learning_rate": 1.1538214725425046e-05,
1457
- "loss": 0.1203,
1458
- "step": 1035
1459
- },
1460
- {
1461
- "epoch": 3.0453879941434847,
1462
- "grad_norm": 4.59375,
1463
- "learning_rate": 1.1453852221463058e-05,
1464
- "loss": 0.1325,
1465
- "step": 1040
1466
- },
1467
- {
1468
- "epoch": 3.060029282576867,
1469
- "grad_norm": 4.0625,
1470
- "learning_rate": 1.1369383877507035e-05,
1471
- "loss": 0.129,
1472
- "step": 1045
1473
- },
1474
- {
1475
- "epoch": 3.074670571010249,
1476
- "grad_norm": 3.109375,
1477
- "learning_rate": 1.1284815842826402e-05,
1478
- "loss": 0.1179,
1479
- "step": 1050
1480
- },
1481
- {
1482
- "epoch": 3.089311859443631,
1483
- "grad_norm": 3.609375,
1484
- "learning_rate": 1.1200154273948047e-05,
1485
- "loss": 0.1366,
1486
- "step": 1055
1487
- },
1488
- {
1489
- "epoch": 3.103953147877013,
1490
- "grad_norm": 3.34375,
1491
- "learning_rate": 1.1115405334208112e-05,
1492
- "loss": 0.1242,
1493
- "step": 1060
1494
- },
1495
- {
1496
- "epoch": 3.1185944363103952,
1497
- "grad_norm": 3.390625,
1498
- "learning_rate": 1.1030575193303312e-05,
1499
- "loss": 0.1238,
1500
- "step": 1065
1501
- },
1502
- {
1503
- "epoch": 3.1332357247437774,
1504
- "grad_norm": 4.03125,
1505
- "learning_rate": 1.0945670026841785e-05,
1506
- "loss": 0.1238,
1507
- "step": 1070
1508
- },
1509
- {
1510
- "epoch": 3.1478770131771596,
1511
- "grad_norm": 3.09375,
1512
- "learning_rate": 1.0860696015893506e-05,
1513
- "loss": 0.1212,
1514
- "step": 1075
1515
- },
1516
- {
1517
- "epoch": 3.162518301610542,
1518
- "grad_norm": 4.40625,
1519
- "learning_rate": 1.0775659346540303e-05,
1520
- "loss": 0.1249,
1521
- "step": 1080
1522
- },
1523
- {
1524
- "epoch": 3.177159590043924,
1525
- "grad_norm": 4.125,
1526
- "learning_rate": 1.0690566209425521e-05,
1527
- "loss": 0.1257,
1528
- "step": 1085
1529
- },
1530
- {
1531
- "epoch": 3.191800878477306,
1532
- "grad_norm": 4.0,
1533
- "learning_rate": 1.060542279930334e-05,
1534
- "loss": 0.1348,
1535
- "step": 1090
1536
- },
1537
- {
1538
- "epoch": 3.206442166910688,
1539
- "grad_norm": 3.984375,
1540
- "learning_rate": 1.0520235314587796e-05,
1541
- "loss": 0.1277,
1542
- "step": 1095
1543
- },
1544
- {
1545
- "epoch": 3.22108345534407,
1546
- "grad_norm": 3.734375,
1547
- "learning_rate": 1.0435009956901547e-05,
1548
- "loss": 0.128,
1549
- "step": 1100
1550
- },
1551
- {
1552
- "epoch": 3.2357247437774523,
1553
- "grad_norm": 4.65625,
1554
- "learning_rate": 1.034975293062439e-05,
1555
- "loss": 0.1258,
1556
- "step": 1105
1557
- },
1558
- {
1559
- "epoch": 3.2503660322108345,
1560
- "grad_norm": 3.1875,
1561
- "learning_rate": 1.026447044244158e-05,
1562
- "loss": 0.1147,
1563
- "step": 1110
1564
- },
1565
- {
1566
- "epoch": 3.2650073206442167,
1567
- "grad_norm": 3.375,
1568
- "learning_rate": 1.0179168700892001e-05,
1569
- "loss": 0.1231,
1570
- "step": 1115
1571
- },
1572
- {
1573
- "epoch": 3.279648609077599,
1574
- "grad_norm": 3.609375,
1575
- "learning_rate": 1.0093853915916165e-05,
1576
- "loss": 0.1211,
1577
- "step": 1120
1578
- },
1579
- {
1580
- "epoch": 3.294289897510981,
1581
- "grad_norm": 4.0625,
1582
- "learning_rate": 1.0008532298404154e-05,
1583
- "loss": 0.1245,
1584
- "step": 1125
1585
- },
1586
- {
1587
- "epoch": 3.3089311859443633,
1588
- "grad_norm": 3.390625,
1589
- "learning_rate": 9.923210059743447e-06,
1590
- "loss": 0.1307,
1591
- "step": 1130
1592
- },
1593
- {
1594
- "epoch": 3.323572474377745,
1595
- "grad_norm": 4.125,
1596
- "learning_rate": 9.837893411366743e-06,
1597
- "loss": 0.1287,
1598
- "step": 1135
1599
- },
1600
- {
1601
- "epoch": 3.3382137628111272,
1602
- "grad_norm": 2.984375,
1603
- "learning_rate": 9.752588564299776e-06,
1604
- "loss": 0.1177,
1605
- "step": 1140
1606
- },
1607
- {
1608
- "epoch": 3.3528550512445094,
1609
- "grad_norm": 3.796875,
1610
- "learning_rate": 9.66730172870914e-06,
1611
- "loss": 0.1236,
1612
- "step": 1145
1613
- },
1614
- {
1615
- "epoch": 3.3674963396778916,
1616
- "grad_norm": 3.65625,
1617
- "learning_rate": 9.582039113450208e-06,
1618
- "loss": 0.1264,
1619
- "step": 1150
1620
- },
1621
- {
1622
- "epoch": 3.382137628111274,
1623
- "grad_norm": 4.03125,
1624
- "learning_rate": 9.496806925615113e-06,
1625
- "loss": 0.1292,
1626
- "step": 1155
1627
- },
1628
- {
1629
- "epoch": 3.396778916544656,
1630
- "grad_norm": 3.84375,
1631
- "learning_rate": 9.411611370080885e-06,
1632
- "loss": 0.1177,
1633
- "step": 1160
1634
- },
1635
- {
1636
- "epoch": 3.411420204978038,
1637
- "grad_norm": 2.828125,
1638
- "learning_rate": 9.326458649057732e-06,
1639
- "loss": 0.126,
1640
- "step": 1165
1641
- },
1642
- {
1643
- "epoch": 3.42606149341142,
1644
- "grad_norm": 3.21875,
1645
- "learning_rate": 9.241354961637525e-06,
1646
- "loss": 0.117,
1647
- "step": 1170
1648
- },
1649
- {
1650
- "epoch": 3.440702781844802,
1651
- "grad_norm": 3.828125,
1652
- "learning_rate": 9.156306503342499e-06,
1653
- "loss": 0.1203,
1654
- "step": 1175
1655
- },
1656
- {
1657
- "epoch": 3.4553440702781844,
1658
- "grad_norm": 3.640625,
1659
- "learning_rate": 9.07131946567423e-06,
1660
- "loss": 0.1278,
1661
- "step": 1180
1662
- },
1663
- {
1664
- "epoch": 3.4699853587115665,
1665
- "grad_norm": 3.375,
1666
- "learning_rate": 8.986400035662897e-06,
1667
- "loss": 0.124,
1668
- "step": 1185
1669
- },
1670
- {
1671
- "epoch": 3.4846266471449487,
1672
- "grad_norm": 3.5625,
1673
- "learning_rate": 8.901554395416842e-06,
1674
- "loss": 0.1246,
1675
- "step": 1190
1676
- },
1677
- {
1678
- "epoch": 3.499267935578331,
1679
- "grad_norm": 3.5,
1680
- "learning_rate": 8.816788721672565e-06,
1681
- "loss": 0.1224,
1682
- "step": 1195
1683
- },
1684
- {
1685
- "epoch": 3.513909224011713,
1686
- "grad_norm": 3.796875,
1687
- "learning_rate": 8.732109185344995e-06,
1688
- "loss": 0.1263,
1689
- "step": 1200
1690
- },
1691
- {
1692
- "epoch": 3.5285505124450953,
1693
- "grad_norm": 3.390625,
1694
- "learning_rate": 8.647521951078318e-06,
1695
- "loss": 0.1209,
1696
- "step": 1205
1697
- },
1698
- {
1699
- "epoch": 3.5431918008784775,
1700
- "grad_norm": 3.6875,
1701
- "learning_rate": 8.563033176797126e-06,
1702
- "loss": 0.1269,
1703
- "step": 1210
1704
- },
1705
- {
1706
- "epoch": 3.5578330893118597,
1707
- "grad_norm": 3.765625,
1708
- "learning_rate": 8.478649013258186e-06,
1709
- "loss": 0.1332,
1710
- "step": 1215
1711
- },
1712
- {
1713
- "epoch": 3.5724743777452415,
1714
- "grad_norm": 3.171875,
1715
- "learning_rate": 8.394375603602602e-06,
1716
- "loss": 0.1263,
1717
- "step": 1220
1718
- },
1719
- {
1720
- "epoch": 3.5871156661786237,
1721
- "grad_norm": 4.28125,
1722
- "learning_rate": 8.310219082908663e-06,
1723
- "loss": 0.1251,
1724
- "step": 1225
1725
- },
1726
- {
1727
- "epoch": 3.601756954612006,
1728
- "grad_norm": 3.625,
1729
- "learning_rate": 8.226185577745149e-06,
1730
- "loss": 0.125,
1731
- "step": 1230
1732
- },
1733
- {
1734
- "epoch": 3.616398243045388,
1735
- "grad_norm": 3.71875,
1736
- "learning_rate": 8.142281205725368e-06,
1737
- "loss": 0.1213,
1738
- "step": 1235
1739
- },
1740
- {
1741
- "epoch": 3.6310395314787702,
1742
- "grad_norm": 3.59375,
1743
- "learning_rate": 8.058512075061758e-06,
1744
- "loss": 0.1284,
1745
- "step": 1240
1746
- },
1747
- {
1748
- "epoch": 3.6456808199121524,
1749
- "grad_norm": 4.28125,
1750
- "learning_rate": 7.974884284121248e-06,
1751
- "loss": 0.1197,
1752
- "step": 1245
1753
- },
1754
- {
1755
- "epoch": 3.660322108345534,
1756
- "grad_norm": 3.828125,
1757
- "learning_rate": 7.891403920981251e-06,
1758
- "loss": 0.1318,
1759
- "step": 1250
1760
- },
1761
- {
1762
- "epoch": 3.6749633967789164,
1763
- "grad_norm": 4.09375,
1764
- "learning_rate": 7.808077062986515e-06,
1765
- "loss": 0.1209,
1766
- "step": 1255
1767
- },
1768
- {
1769
- "epoch": 3.6896046852122986,
1770
- "grad_norm": 4.125,
1771
- "learning_rate": 7.724909776306625e-06,
1772
- "loss": 0.1264,
1773
- "step": 1260
1774
- },
1775
- {
1776
- "epoch": 3.7042459736456808,
1777
- "grad_norm": 3.90625,
1778
- "learning_rate": 7.64190811549446e-06,
1779
- "loss": 0.1187,
1780
- "step": 1265
1781
- },
1782
- {
1783
- "epoch": 3.718887262079063,
1784
- "grad_norm": 3.65625,
1785
- "learning_rate": 7.5590781230453515e-06,
1786
- "loss": 0.1386,
1787
- "step": 1270
1788
- },
1789
- {
1790
- "epoch": 3.733528550512445,
1791
- "grad_norm": 3.21875,
1792
- "learning_rate": 7.4764258289572575e-06,
1793
- "loss": 0.1183,
1794
- "step": 1275
1795
- },
1796
- {
1797
- "epoch": 3.7481698389458273,
1798
- "grad_norm": 3.203125,
1799
- "learning_rate": 7.393957250291725e-06,
1800
- "loss": 0.1258,
1801
- "step": 1280
1802
- },
1803
- {
1804
- "epoch": 3.7628111273792095,
1805
- "grad_norm": 3.5625,
1806
- "learning_rate": 7.3116783907358975e-06,
1807
- "loss": 0.1217,
1808
- "step": 1285
1809
- },
1810
- {
1811
- "epoch": 3.7774524158125917,
1812
- "grad_norm": 3.8125,
1813
- "learning_rate": 7.229595240165406e-06,
1814
- "loss": 0.1238,
1815
- "step": 1290
1816
- },
1817
- {
1818
- "epoch": 3.792093704245974,
1819
- "grad_norm": 4.125,
1820
- "learning_rate": 7.1477137742083425e-06,
1821
- "loss": 0.1221,
1822
- "step": 1295
1823
- },
1824
- {
1825
- "epoch": 3.8067349926793557,
1826
- "grad_norm": 3.546875,
1827
- "learning_rate": 7.066039953810208e-06,
1828
- "loss": 0.1147,
1829
- "step": 1300
1830
- },
1831
- {
1832
- "epoch": 3.821376281112738,
1833
- "grad_norm": 3.890625,
1834
- "learning_rate": 6.984579724799985e-06,
1835
- "loss": 0.1248,
1836
- "step": 1305
1837
- },
1838
- {
1839
- "epoch": 3.83601756954612,
1840
- "grad_norm": 3.09375,
1841
- "learning_rate": 6.903339017457254e-06,
1842
- "loss": 0.1205,
1843
- "step": 1310
1844
- },
1845
- {
1846
- "epoch": 3.8506588579795022,
1847
- "grad_norm": 3.71875,
1848
- "learning_rate": 6.822323746080499e-06,
1849
- "loss": 0.1205,
1850
- "step": 1315
1851
- },
1852
- {
1853
- "epoch": 3.8653001464128844,
1854
- "grad_norm": 3.75,
1855
- "learning_rate": 6.741539808556525e-06,
1856
- "loss": 0.1258,
1857
- "step": 1320
1858
- },
1859
- {
1860
- "epoch": 3.8799414348462666,
1861
- "grad_norm": 4.03125,
1862
- "learning_rate": 6.660993085931113e-06,
1863
- "loss": 0.1225,
1864
- "step": 1325
1865
- },
1866
- {
1867
- "epoch": 3.8945827232796484,
1868
- "grad_norm": 3.625,
1869
- "learning_rate": 6.580689441980861e-06,
1870
- "loss": 0.1187,
1871
- "step": 1330
1872
- },
1873
- {
1874
- "epoch": 3.9092240117130306,
1875
- "grad_norm": 3.375,
1876
- "learning_rate": 6.5006347227863265e-06,
1877
- "loss": 0.1204,
1878
- "step": 1335
1879
- },
1880
- {
1881
- "epoch": 3.9238653001464128,
1882
- "grad_norm": 3.265625,
1883
- "learning_rate": 6.420834756306411e-06,
1884
- "loss": 0.1163,
1885
- "step": 1340
1886
- },
1887
- {
1888
- "epoch": 3.938506588579795,
1889
- "grad_norm": 3.4375,
1890
- "learning_rate": 6.341295351954105e-06,
1891
- "loss": 0.1184,
1892
- "step": 1345
1893
- },
1894
- {
1895
- "epoch": 3.953147877013177,
1896
- "grad_norm": 3.546875,
1897
- "learning_rate": 6.262022300173549e-06,
1898
- "loss": 0.1159,
1899
- "step": 1350
1900
- },
1901
- {
1902
- "epoch": 3.9677891654465594,
1903
- "grad_norm": 4.40625,
1904
- "learning_rate": 6.183021372018508e-06,
1905
- "loss": 0.1212,
1906
- "step": 1355
1907
- },
1908
- {
1909
- "epoch": 3.9824304538799415,
1910
- "grad_norm": 3.4375,
1911
- "learning_rate": 6.104298318732218e-06,
1912
- "loss": 0.1168,
1913
- "step": 1360
1914
- },
1915
- {
1916
- "epoch": 3.9970717423133237,
1917
- "grad_norm": 3.8125,
1918
- "learning_rate": 6.025858871328721e-06,
1919
- "loss": 0.1169,
1920
- "step": 1365
1921
- },
1922
- {
1923
- "epoch": 4.011713030746706,
1924
- "grad_norm": 1.6875,
1925
- "learning_rate": 5.947708740175633e-06,
1926
- "loss": 0.0969,
1927
- "step": 1370
1928
- },
1929
- {
1930
- "epoch": 4.026354319180088,
1931
- "grad_norm": 2.4375,
1932
- "learning_rate": 5.869853614578438e-06,
1933
- "loss": 0.0787,
1934
- "step": 1375
1935
- },
1936
- {
1937
- "epoch": 4.04099560761347,
1938
- "grad_norm": 1.9140625,
1939
- "learning_rate": 5.792299162366304e-06,
1940
- "loss": 0.0806,
1941
- "step": 1380
1942
- },
1943
- {
1944
- "epoch": 4.0556368960468525,
1945
- "grad_norm": 4.59375,
1946
- "learning_rate": 5.715051029479475e-06,
1947
- "loss": 0.0794,
1948
- "step": 1385
1949
- },
1950
- {
1951
- "epoch": 4.070278184480234,
1952
- "grad_norm": 2.109375,
1953
- "learning_rate": 5.638114839558233e-06,
1954
- "loss": 0.0756,
1955
- "step": 1390
1956
- },
1957
- {
1958
- "epoch": 4.084919472913616,
1959
- "grad_norm": 2.828125,
1960
- "learning_rate": 5.561496193533516e-06,
1961
- "loss": 0.0793,
1962
- "step": 1395
1963
- },
1964
- {
1965
- "epoch": 4.099560761346998,
1966
- "grad_norm": 2.578125,
1967
- "learning_rate": 5.485200669219155e-06,
1968
- "loss": 0.0749,
1969
- "step": 1400
1970
- },
1971
- {
1972
- "epoch": 4.11420204978038,
1973
- "grad_norm": 2.078125,
1974
- "learning_rate": 5.4092338209058375e-06,
1975
- "loss": 0.0752,
1976
- "step": 1405
1977
- },
1978
- {
1979
- "epoch": 4.128843338213763,
1980
- "grad_norm": 2.6875,
1981
- "learning_rate": 5.333601178956718e-06,
1982
- "loss": 0.0749,
1983
- "step": 1410
1984
- },
1985
- {
1986
- "epoch": 4.143484626647145,
1987
- "grad_norm": 2.1875,
1988
- "learning_rate": 5.258308249404853e-06,
1989
- "loss": 0.0764,
1990
- "step": 1415
1991
- },
1992
- {
1993
- "epoch": 4.158125915080527,
1994
- "grad_norm": 2.234375,
1995
- "learning_rate": 5.183360513552313e-06,
1996
- "loss": 0.0774,
1997
- "step": 1420
1998
- },
1999
- {
2000
- "epoch": 4.172767203513909,
2001
- "grad_norm": 2.1875,
2002
- "learning_rate": 5.108763427571203e-06,
2003
- "loss": 0.0769,
2004
- "step": 1425
2005
- },
2006
- {
2007
- "epoch": 4.187408491947291,
2008
- "grad_norm": 2.375,
2009
- "learning_rate": 5.034522422106403e-06,
2010
- "loss": 0.0808,
2011
- "step": 1430
2012
- },
2013
- {
2014
- "epoch": 4.202049780380674,
2015
- "grad_norm": 2.796875,
2016
- "learning_rate": 4.96064290188026e-06,
2017
- "loss": 0.0831,
2018
- "step": 1435
2019
- },
2020
- {
2021
- "epoch": 4.216691068814056,
2022
- "grad_norm": 2.703125,
2023
- "learning_rate": 4.887130245299082e-06,
2024
- "loss": 0.0774,
2025
- "step": 1440
2026
- },
2027
- {
2028
- "epoch": 4.231332357247438,
2029
- "grad_norm": 2.265625,
2030
- "learning_rate": 4.813989804061644e-06,
2031
- "loss": 0.0782,
2032
- "step": 1445
2033
- },
2034
- {
2035
- "epoch": 4.24597364568082,
2036
- "grad_norm": 2.28125,
2037
- "learning_rate": 4.741226902769537e-06,
2038
- "loss": 0.0801,
2039
- "step": 1450
2040
- },
2041
- {
2042
- "epoch": 4.260614934114202,
2043
- "grad_norm": 2.546875,
2044
- "learning_rate": 4.668846838539581e-06,
2045
- "loss": 0.0787,
2046
- "step": 1455
2047
- },
2048
- {
2049
- "epoch": 4.2752562225475845,
2050
- "grad_norm": 3.03125,
2051
- "learning_rate": 4.596854880618149e-06,
2052
- "loss": 0.0782,
2053
- "step": 1460
2054
- },
2055
- {
2056
- "epoch": 4.289897510980967,
2057
- "grad_norm": 2.46875,
2058
- "learning_rate": 4.525256269997621e-06,
2059
- "loss": 0.0797,
2060
- "step": 1465
2061
- },
2062
- {
2063
- "epoch": 4.304538799414349,
2064
- "grad_norm": 3.171875,
2065
- "learning_rate": 4.4540562190347935e-06,
2066
- "loss": 0.0788,
2067
- "step": 1470
2068
- },
2069
- {
2070
- "epoch": 4.31918008784773,
2071
- "grad_norm": 2.578125,
2072
- "learning_rate": 4.383259911071465e-06,
2073
- "loss": 0.0797,
2074
- "step": 1475
2075
- },
2076
- {
2077
- "epoch": 4.333821376281112,
2078
- "grad_norm": 2.078125,
2079
- "learning_rate": 4.31287250005704e-06,
2080
- "loss": 0.0794,
2081
- "step": 1480
2082
- },
2083
- {
2084
- "epoch": 4.348462664714495,
2085
- "grad_norm": 2.375,
2086
- "learning_rate": 4.242899110173375e-06,
2087
- "loss": 0.0785,
2088
- "step": 1485
2089
- },
2090
- {
2091
- "epoch": 4.363103953147877,
2092
- "grad_norm": 2.125,
2093
- "learning_rate": 4.173344835461701e-06,
2094
- "loss": 0.0752,
2095
- "step": 1490
2096
- },
2097
- {
2098
- "epoch": 4.377745241581259,
2099
- "grad_norm": 3.796875,
2100
- "learning_rate": 4.1042147394518106e-06,
2101
- "loss": 0.0789,
2102
- "step": 1495
2103
- },
2104
- {
2105
- "epoch": 4.392386530014641,
2106
- "grad_norm": 2.203125,
2107
- "learning_rate": 4.035513854793389e-06,
2108
- "loss": 0.0754,
2109
- "step": 1500
2110
- },
2111
- {
2112
- "epoch": 4.407027818448023,
2113
- "grad_norm": 2.390625,
2114
- "learning_rate": 3.967247182889698e-06,
2115
- "loss": 0.0793,
2116
- "step": 1505
2117
- },
2118
- {
2119
- "epoch": 4.421669106881406,
2120
- "grad_norm": 2.359375,
2121
- "learning_rate": 3.899419693533423e-06,
2122
- "loss": 0.0789,
2123
- "step": 1510
2124
- },
2125
- {
2126
- "epoch": 4.436310395314788,
2127
- "grad_norm": 2.515625,
2128
- "learning_rate": 3.832036324544912e-06,
2129
- "loss": 0.0769,
2130
- "step": 1515
2131
- },
2132
- {
2133
- "epoch": 4.45095168374817,
2134
- "grad_norm": 2.015625,
2135
- "learning_rate": 3.7651019814126656e-06,
2136
- "loss": 0.0772,
2137
- "step": 1520
2138
- },
2139
- {
2140
- "epoch": 4.465592972181552,
2141
- "grad_norm": 2.390625,
2142
- "learning_rate": 3.698621536936262e-06,
2143
- "loss": 0.0794,
2144
- "step": 1525
2145
- },
2146
- {
2147
- "epoch": 4.480234260614934,
2148
- "grad_norm": 2.40625,
2149
- "learning_rate": 3.6325998308715827e-06,
2150
- "loss": 0.0787,
2151
- "step": 1530
2152
- },
2153
- {
2154
- "epoch": 4.4948755490483165,
2155
- "grad_norm": 2.140625,
2156
- "learning_rate": 3.567041669578507e-06,
2157
- "loss": 0.0777,
2158
- "step": 1535
2159
- },
2160
- {
2161
- "epoch": 4.509516837481699,
2162
- "grad_norm": 1.859375,
2163
- "learning_rate": 3.5019518256709773e-06,
2164
- "loss": 0.0792,
2165
- "step": 1540
2166
- },
2167
- {
2168
- "epoch": 4.524158125915081,
2169
- "grad_norm": 2.296875,
2170
- "learning_rate": 3.437335037669598e-06,
2171
- "loss": 0.0787,
2172
- "step": 1545
2173
- },
2174
- {
2175
- "epoch": 4.538799414348462,
2176
- "grad_norm": 2.265625,
2177
- "learning_rate": 3.3731960096566297e-06,
2178
- "loss": 0.0774,
2179
- "step": 1550
2180
- },
2181
- {
2182
- "epoch": 4.553440702781844,
2183
- "grad_norm": 2.21875,
2184
- "learning_rate": 3.3095394109335686e-06,
2185
- "loss": 0.0796,
2186
- "step": 1555
2187
- },
2188
- {
2189
- "epoch": 4.568081991215227,
2190
- "grad_norm": 2.296875,
2191
- "learning_rate": 3.2463698756811966e-06,
2192
- "loss": 0.0768,
2193
- "step": 1560
2194
- },
2195
- {
2196
- "epoch": 4.582723279648609,
2197
- "grad_norm": 2.90625,
2198
- "learning_rate": 3.1836920026222297e-06,
2199
- "loss": 0.0784,
2200
- "step": 1565
2201
- },
2202
- {
2203
- "epoch": 4.597364568081991,
2204
- "grad_norm": 2.171875,
2205
- "learning_rate": 3.12151035468652e-06,
2206
- "loss": 0.078,
2207
- "step": 1570
2208
- },
2209
- {
2210
- "epoch": 4.612005856515373,
2211
- "grad_norm": 3.015625,
2212
- "learning_rate": 3.059829458678899e-06,
2213
- "loss": 0.0778,
2214
- "step": 1575
2215
- },
2216
- {
2217
- "epoch": 4.626647144948755,
2218
- "grad_norm": 2.421875,
2219
- "learning_rate": 2.9986538049495984e-06,
2220
- "loss": 0.0768,
2221
- "step": 1580
2222
- },
2223
- {
2224
- "epoch": 4.641288433382138,
2225
- "grad_norm": 2.09375,
2226
- "learning_rate": 2.937987847067372e-06,
2227
- "loss": 0.0755,
2228
- "step": 1585
2229
- },
2230
- {
2231
- "epoch": 4.65592972181552,
2232
- "grad_norm": 2.265625,
2233
- "learning_rate": 2.877836001495269e-06,
2234
- "loss": 0.0792,
2235
- "step": 1590
2236
- },
2237
- {
2238
- "epoch": 4.670571010248902,
2239
- "grad_norm": 2.140625,
2240
- "learning_rate": 2.8182026472691303e-06,
2241
- "loss": 0.0778,
2242
- "step": 1595
2243
- },
2244
- {
2245
- "epoch": 4.685212298682284,
2246
- "grad_norm": 3.3125,
2247
- "learning_rate": 2.7590921256787797e-06,
2248
- "loss": 0.0783,
2249
- "step": 1600
2250
- },
2251
- {
2252
- "epoch": 4.699853587115666,
2253
- "grad_norm": 3.0625,
2254
- "learning_rate": 2.7005087399519836e-06,
2255
- "loss": 0.0813,
2256
- "step": 1605
2257
- },
2258
- {
2259
- "epoch": 4.714494875549049,
2260
- "grad_norm": 2.375,
2261
- "learning_rate": 2.6424567549411838e-06,
2262
- "loss": 0.082,
2263
- "step": 1610
2264
- },
2265
- {
2266
- "epoch": 4.729136163982431,
2267
- "grad_norm": 2.359375,
2268
- "learning_rate": 2.5849403968130182e-06,
2269
- "loss": 0.0721,
2270
- "step": 1615
2271
- },
2272
- {
2273
- "epoch": 4.743777452415813,
2274
- "grad_norm": 2.15625,
2275
- "learning_rate": 2.5279638527406426e-06,
2276
- "loss": 0.0774,
2277
- "step": 1620
2278
- },
2279
- {
2280
- "epoch": 4.758418740849194,
2281
- "grad_norm": 2.46875,
2282
- "learning_rate": 2.4715312705989236e-06,
2283
- "loss": 0.0816,
2284
- "step": 1625
2285
- },
2286
- {
2287
- "epoch": 4.773060029282577,
2288
- "grad_norm": 2.484375,
2289
- "learning_rate": 2.4156467586624588e-06,
2290
- "loss": 0.0793,
2291
- "step": 1630
2292
- },
2293
- {
2294
- "epoch": 4.787701317715959,
2295
- "grad_norm": 1.9609375,
2296
- "learning_rate": 2.3603143853065146e-06,
2297
- "loss": 0.0755,
2298
- "step": 1635
2299
- },
2300
- {
2301
- "epoch": 4.802342606149341,
2302
- "grad_norm": 2.25,
2303
- "learning_rate": 2.305538178710831e-06,
2304
- "loss": 0.081,
2305
- "step": 1640
2306
- },
2307
- {
2308
- "epoch": 4.816983894582723,
2309
- "grad_norm": 2.84375,
2310
- "learning_rate": 2.25132212656638e-06,
2311
- "loss": 0.0824,
2312
- "step": 1645
2313
- },
2314
- {
2315
- "epoch": 4.831625183016105,
2316
- "grad_norm": 2.921875,
2317
- "learning_rate": 2.1976701757850603e-06,
2318
- "loss": 0.0757,
2319
- "step": 1650
2320
- },
2321
- {
2322
- "epoch": 4.846266471449487,
2323
- "grad_norm": 2.5,
2324
- "learning_rate": 2.1445862322123734e-06,
2325
- "loss": 0.0787,
2326
- "step": 1655
2327
- },
2328
- {
2329
- "epoch": 4.86090775988287,
2330
- "grad_norm": 2.171875,
2331
- "learning_rate": 2.092074160343063e-06,
2332
- "loss": 0.0751,
2333
- "step": 1660
2334
- },
2335
- {
2336
- "epoch": 4.875549048316252,
2337
- "grad_norm": 2.265625,
2338
- "learning_rate": 2.0401377830397874e-06,
2339
- "loss": 0.0797,
2340
- "step": 1665
2341
- },
2342
- {
2343
- "epoch": 4.890190336749634,
2344
- "grad_norm": 2.40625,
2345
- "learning_rate": 1.9887808812548272e-06,
2346
- "loss": 0.0764,
2347
- "step": 1670
2348
- },
2349
- {
2350
- "epoch": 4.904831625183016,
2351
- "grad_norm": 2.109375,
2352
- "learning_rate": 1.938007193754816e-06,
2353
- "loss": 0.0809,
2354
- "step": 1675
2355
- },
2356
- {
2357
- "epoch": 4.919472913616398,
2358
- "grad_norm": 2.3125,
2359
- "learning_rate": 1.8878204168485691e-06,
2360
- "loss": 0.0792,
2361
- "step": 1680
2362
- },
2363
- {
2364
- "epoch": 4.934114202049781,
2365
- "grad_norm": 2.34375,
2366
- "learning_rate": 1.8382242041179876e-06,
2367
- "loss": 0.0775,
2368
- "step": 1685
2369
- },
2370
- {
2371
- "epoch": 4.948755490483163,
2372
- "grad_norm": 2.328125,
2373
- "learning_rate": 1.7892221661520925e-06,
2374
- "loss": 0.0836,
2375
- "step": 1690
2376
- },
2377
- {
2378
- "epoch": 4.963396778916545,
2379
- "grad_norm": 2.453125,
2380
- "learning_rate": 1.740817870284155e-06,
2381
- "loss": 0.0768,
2382
- "step": 1695
2383
- },
2384
- {
2385
- "epoch": 4.978038067349927,
2386
- "grad_norm": 3.09375,
2387
- "learning_rate": 1.693014840332009e-06,
2388
- "loss": 0.0746,
2389
- "step": 1700
2390
- },
2391
- {
2392
- "epoch": 4.992679355783309,
2393
- "grad_norm": 2.25,
2394
- "learning_rate": 1.645816556341513e-06,
2395
- "loss": 0.0769,
2396
- "step": 1705
2397
- },
2398
- {
2399
- "epoch": 5.007320644216691,
2400
- "grad_norm": 2.0625,
2401
- "learning_rate": 1.5992264543332125e-06,
2402
- "loss": 0.0747,
2403
- "step": 1710
2404
- },
2405
- {
2406
- "epoch": 5.021961932650073,
2407
- "grad_norm": 2.109375,
2408
- "learning_rate": 1.5532479260521849e-06,
2409
- "loss": 0.0711,
2410
- "step": 1715
2411
- },
2412
- {
2413
- "epoch": 5.036603221083455,
2414
- "grad_norm": 1.4375,
2415
- "learning_rate": 1.507884318721131e-06,
2416
- "loss": 0.0685,
2417
- "step": 1720
2418
- },
2419
- {
2420
- "epoch": 5.051244509516837,
2421
- "grad_norm": 1.8359375,
2422
- "learning_rate": 1.463138934796694e-06,
2423
- "loss": 0.0653,
2424
- "step": 1725
2425
- },
2426
- {
2427
- "epoch": 5.065885797950219,
2428
- "grad_norm": 1.59375,
2429
- "learning_rate": 1.4190150317290485e-06,
2430
- "loss": 0.0697,
2431
- "step": 1730
2432
- },
2433
- {
2434
- "epoch": 5.080527086383602,
2435
- "grad_norm": 2.640625,
2436
- "learning_rate": 1.3755158217247488e-06,
2437
- "loss": 0.0696,
2438
- "step": 1735
2439
- },
2440
- {
2441
- "epoch": 5.095168374816984,
2442
- "grad_norm": 1.6640625,
2443
- "learning_rate": 1.3326444715128884e-06,
2444
- "loss": 0.068,
2445
- "step": 1740
2446
- },
2447
- {
2448
- "epoch": 5.109809663250366,
2449
- "grad_norm": 1.9765625,
2450
- "learning_rate": 1.2904041021145597e-06,
2451
- "loss": 0.0688,
2452
- "step": 1745
2453
- },
2454
- {
2455
- "epoch": 5.124450951683748,
2456
- "grad_norm": 1.8203125,
2457
- "learning_rate": 1.2487977886156522e-06,
2458
- "loss": 0.0689,
2459
- "step": 1750
2460
- },
2461
- {
2462
- "epoch": 5.13909224011713,
2463
- "grad_norm": 1.9765625,
2464
- "learning_rate": 1.207828559942974e-06,
2465
- "loss": 0.0688,
2466
- "step": 1755
2467
- },
2468
- {
2469
- "epoch": 5.153733528550513,
2470
- "grad_norm": 1.7578125,
2471
- "learning_rate": 1.1674993986437567e-06,
2472
- "loss": 0.0684,
2473
- "step": 1760
2474
- },
2475
- {
2476
- "epoch": 5.168374816983895,
2477
- "grad_norm": 1.4921875,
2478
- "learning_rate": 1.1278132406685226e-06,
2479
- "loss": 0.0688,
2480
- "step": 1765
2481
- },
2482
- {
2483
- "epoch": 5.183016105417277,
2484
- "grad_norm": 1.875,
2485
- "learning_rate": 1.0887729751573562e-06,
2486
- "loss": 0.0715,
2487
- "step": 1770
2488
- },
2489
- {
2490
- "epoch": 5.197657393850659,
2491
- "grad_norm": 1.859375,
2492
- "learning_rate": 1.0503814442295624e-06,
2493
- "loss": 0.0712,
2494
- "step": 1775
2495
- },
2496
- {
2497
- "epoch": 5.212298682284041,
2498
- "grad_norm": 1.5703125,
2499
- "learning_rate": 1.0126414427767716e-06,
2500
- "loss": 0.0677,
2501
- "step": 1780
2502
- },
2503
- {
2504
- "epoch": 5.2269399707174236,
2505
- "grad_norm": 1.703125,
2506
- "learning_rate": 9.755557182594656e-07,
2507
- "loss": 0.0706,
2508
- "step": 1785
2509
- },
2510
- {
2511
- "epoch": 5.241581259150805,
2512
- "grad_norm": 1.59375,
2513
- "learning_rate": 9.391269705069739e-07,
2514
- "loss": 0.0729,
2515
- "step": 1790
2516
- },
2517
- {
2518
- "epoch": 5.256222547584187,
2519
- "grad_norm": 1.8125,
2520
- "learning_rate": 9.033578515209108e-07,
2521
- "loss": 0.0671,
2522
- "step": 1795
2523
- },
2524
- {
2525
- "epoch": 5.270863836017569,
2526
- "grad_norm": 1.953125,
2527
- "learning_rate": 8.682509652821303e-07,
2528
- "loss": 0.0685,
2529
- "step": 1800
2530
- },
2531
- {
2532
- "epoch": 5.285505124450951,
2533
- "grad_norm": 1.5390625,
2534
- "learning_rate": 8.338088675611323e-07,
2535
- "loss": 0.0682,
2536
- "step": 1805
2537
- },
2538
- {
2539
- "epoch": 5.300146412884334,
2540
- "grad_norm": 1.84375,
2541
- "learning_rate": 8.000340657320304e-07,
2542
- "loss": 0.0681,
2543
- "step": 1810
2544
- },
2545
- {
2546
- "epoch": 5.314787701317716,
2547
- "grad_norm": 2.234375,
2548
- "learning_rate": 7.669290185899947e-07,
2549
- "loss": 0.0683,
2550
- "step": 1815
2551
- },
2552
- {
2553
- "epoch": 5.329428989751098,
2554
- "grad_norm": 1.9296875,
2555
- "learning_rate": 7.34496136172268e-07,
2556
- "loss": 0.0701,
2557
- "step": 1820
2558
- },
2559
- {
2560
- "epoch": 5.34407027818448,
2561
- "grad_norm": 1.7734375,
2562
- "learning_rate": 7.027377795826962e-07,
2563
- "loss": 0.0696,
2564
- "step": 1825
2565
- },
2566
- {
2567
- "epoch": 5.358711566617862,
2568
- "grad_norm": 1.3984375,
2569
- "learning_rate": 6.716562608198651e-07,
2570
- "loss": 0.065,
2571
- "step": 1830
2572
- },
2573
- {
2574
- "epoch": 5.373352855051245,
2575
- "grad_norm": 1.6015625,
2576
- "learning_rate": 6.412538426087667e-07,
2577
- "loss": 0.0741,
2578
- "step": 1835
2579
- },
2580
- {
2581
- "epoch": 5.387994143484627,
2582
- "grad_norm": 2.0,
2583
- "learning_rate": 6.115327382360892e-07,
2584
- "loss": 0.0707,
2585
- "step": 1840
2586
- },
2587
- {
2588
- "epoch": 5.402635431918009,
2589
- "grad_norm": 1.8828125,
2590
- "learning_rate": 5.82495111389072e-07,
2591
- "loss": 0.0675,
2592
- "step": 1845
2593
- },
2594
- {
2595
- "epoch": 5.417276720351391,
2596
- "grad_norm": 1.59375,
2597
- "learning_rate": 5.541430759980137e-07,
2598
- "loss": 0.0657,
2599
- "step": 1850
2600
- },
2601
- {
2602
- "epoch": 5.431918008784773,
2603
- "grad_norm": 2.0,
2604
- "learning_rate": 5.264786960823565e-07,
2605
- "loss": 0.0684,
2606
- "step": 1855
2607
- },
2608
- {
2609
- "epoch": 5.446559297218156,
2610
- "grad_norm": 1.7109375,
2611
- "learning_rate": 4.995039856004447e-07,
2612
- "loss": 0.0687,
2613
- "step": 1860
2614
- },
2615
- {
2616
- "epoch": 5.461200585651538,
2617
- "grad_norm": 2.046875,
2618
- "learning_rate": 4.7322090830288825e-07,
2619
- "loss": 0.0722,
2620
- "step": 1865
2621
- },
2622
- {
2623
- "epoch": 5.475841874084919,
2624
- "grad_norm": 1.5,
2625
- "learning_rate": 4.4763137758962685e-07,
2626
- "loss": 0.0672,
2627
- "step": 1870
2628
- },
2629
- {
2630
- "epoch": 5.490483162518301,
2631
- "grad_norm": 1.8203125,
2632
- "learning_rate": 4.227372563706134e-07,
2633
- "loss": 0.0691,
2634
- "step": 1875
2635
- },
2636
- {
2637
- "epoch": 5.5051244509516835,
2638
- "grad_norm": 1.828125,
2639
- "learning_rate": 3.985403569302093e-07,
2640
- "loss": 0.0669,
2641
- "step": 1880
2642
- },
2643
- {
2644
- "epoch": 5.519765739385066,
2645
- "grad_norm": 1.953125,
2646
- "learning_rate": 3.7504244079524023e-07,
2647
- "loss": 0.0656,
2648
- "step": 1885
2649
- },
2650
- {
2651
- "epoch": 5.534407027818448,
2652
- "grad_norm": 2.171875,
2653
- "learning_rate": 3.522452186067671e-07,
2654
- "loss": 0.0678,
2655
- "step": 1890
2656
- },
2657
- {
2658
- "epoch": 5.54904831625183,
2659
- "grad_norm": 1.6484375,
2660
- "learning_rate": 3.3015034999554273e-07,
2661
- "loss": 0.0702,
2662
- "step": 1895
2663
- },
2664
- {
2665
- "epoch": 5.563689604685212,
2666
- "grad_norm": 1.6484375,
2667
- "learning_rate": 3.08759443461204e-07,
2668
- "loss": 0.0668,
2669
- "step": 1900
2670
- },
2671
- {
2672
- "epoch": 5.578330893118594,
2673
- "grad_norm": 1.8515625,
2674
- "learning_rate": 2.8807405625515206e-07,
2675
- "loss": 0.0692,
2676
- "step": 1905
2677
- },
2678
- {
2679
- "epoch": 5.592972181551977,
2680
- "grad_norm": 1.6796875,
2681
- "learning_rate": 2.680956942672119e-07,
2682
- "loss": 0.0673,
2683
- "step": 1910
2684
- },
2685
- {
2686
- "epoch": 5.607613469985359,
2687
- "grad_norm": 1.6875,
2688
- "learning_rate": 2.488258119159814e-07,
2689
- "loss": 0.0714,
2690
- "step": 1915
2691
- },
2692
- {
2693
- "epoch": 5.622254758418741,
2694
- "grad_norm": 1.6875,
2695
- "learning_rate": 2.302658120429635e-07,
2696
- "loss": 0.0663,
2697
- "step": 1920
2698
- },
2699
- {
2700
- "epoch": 5.636896046852123,
2701
- "grad_norm": 1.359375,
2702
- "learning_rate": 2.1241704581043132e-07,
2703
- "loss": 0.0626,
2704
- "step": 1925
2705
- },
2706
- {
2707
- "epoch": 5.651537335285505,
2708
- "grad_norm": 1.71875,
2709
- "learning_rate": 1.9528081260307363e-07,
2710
- "loss": 0.0658,
2711
- "step": 1930
2712
- },
2713
- {
2714
- "epoch": 5.666178623718888,
2715
- "grad_norm": 1.8125,
2716
- "learning_rate": 1.7885835993338817e-07,
2717
- "loss": 0.0701,
2718
- "step": 1935
2719
- },
2720
- {
2721
- "epoch": 5.68081991215227,
2722
- "grad_norm": 1.9921875,
2723
- "learning_rate": 1.6315088335087549e-07,
2724
- "loss": 0.0715,
2725
- "step": 1940
2726
- },
2727
- {
2728
- "epoch": 5.695461200585651,
2729
- "grad_norm": 1.5234375,
2730
- "learning_rate": 1.4815952635499063e-07,
2731
- "loss": 0.0657,
2732
- "step": 1945
2733
- },
2734
- {
2735
- "epoch": 5.710102489019034,
2736
- "grad_norm": 1.921875,
2737
- "learning_rate": 1.3388538031190888e-07,
2738
- "loss": 0.0682,
2739
- "step": 1950
2740
- },
2741
- {
2742
- "epoch": 5.7247437774524155,
2743
- "grad_norm": 1.578125,
2744
- "learning_rate": 1.2032948437506574e-07,
2745
- "loss": 0.0667,
2746
- "step": 1955
2747
- },
2748
- {
2749
- "epoch": 5.739385065885798,
2750
- "grad_norm": 1.796875,
2751
- "learning_rate": 1.07492825409512e-07,
2752
- "loss": 0.071,
2753
- "step": 1960
2754
- },
2755
- {
2756
- "epoch": 5.75402635431918,
2757
- "grad_norm": 1.9140625,
2758
- "learning_rate": 9.537633792006673e-08,
2759
- "loss": 0.0676,
2760
- "step": 1965
2761
- },
2762
- {
2763
- "epoch": 5.768667642752562,
2764
- "grad_norm": 1.53125,
2765
- "learning_rate": 8.39809039832884e-08,
2766
- "loss": 0.0722,
2767
- "step": 1970
2768
- },
2769
- {
2770
- "epoch": 5.783308931185944,
2771
- "grad_norm": 1.8515625,
2772
- "learning_rate": 7.330735318325843e-08,
2773
- "loss": 0.0707,
2774
- "step": 1975
2775
- },
2776
- {
2777
- "epoch": 5.797950219619326,
2778
- "grad_norm": 1.640625,
2779
- "learning_rate": 6.335646255118844e-08,
2780
- "loss": 0.0697,
2781
- "step": 1980
2782
- },
2783
- {
2784
- "epoch": 5.812591508052709,
2785
- "grad_norm": 1.921875,
2786
- "learning_rate": 5.412895650885319e-08,
2787
- "loss": 0.0709,
2788
- "step": 1985
2789
- },
2790
- {
2791
- "epoch": 5.827232796486091,
2792
- "grad_norm": 1.734375,
2793
- "learning_rate": 4.562550681584954e-08,
2794
- "loss": 0.0702,
2795
- "step": 1990
2796
- },
2797
- {
2798
- "epoch": 5.841874084919473,
2799
- "grad_norm": 1.828125,
2800
- "learning_rate": 3.784673252069659e-08,
2801
- "loss": 0.0713,
2802
- "step": 1995
2803
- },
2804
- {
2805
- "epoch": 5.856515373352855,
2806
- "grad_norm": 1.8984375,
2807
- "learning_rate": 3.079319991576957e-08,
2808
- "loss": 0.0677,
2809
- "step": 2000
2810
- },
2811
- {
2812
- "epoch": 5.871156661786237,
2813
- "grad_norm": 2.03125,
2814
- "learning_rate": 2.4465422496069425e-08,
2815
- "loss": 0.0671,
2816
- "step": 2005
2817
- },
2818
- {
2819
- "epoch": 5.88579795021962,
2820
- "grad_norm": 1.9921875,
2821
- "learning_rate": 1.886386092184389e-08,
2822
- "loss": 0.07,
2823
- "step": 2010
2824
- },
2825
- {
2826
- "epoch": 5.900439238653002,
2827
- "grad_norm": 2.046875,
2828
- "learning_rate": 1.3988922985048724e-08,
2829
- "loss": 0.0721,
2830
- "step": 2015
2831
- },
2832
- {
2833
- "epoch": 5.915080527086384,
2834
- "grad_norm": 1.9375,
2835
- "learning_rate": 9.840963579667017e-09,
2836
- "loss": 0.0671,
2837
- "step": 2020
2838
- },
2839
- {
2840
- "epoch": 5.929721815519766,
2841
- "grad_norm": 1.765625,
2842
- "learning_rate": 6.420284675865418e-09,
2843
- "loss": 0.0665,
2844
- "step": 2025
2845
- },
2846
- {
2847
- "epoch": 5.9443631039531475,
2848
- "grad_norm": 1.8984375,
2849
- "learning_rate": 3.7271352980139395e-09,
2850
- "loss": 0.0653,
2851
- "step": 2030
2852
- },
2853
- {
2854
- "epoch": 5.95900439238653,
2855
- "grad_norm": 1.7265625,
2856
- "learning_rate": 1.7617115065593493e-09,
2857
- "loss": 0.0691,
2858
- "step": 2035
2859
- },
2860
- {
2861
- "epoch": 5.973645680819912,
2862
- "grad_norm": 1.671875,
2863
- "learning_rate": 5.241563837476982e-10,
2864
- "loss": 0.0699,
2865
- "step": 2040
2866
- },
2867
- {
2868
- "epoch": 5.988286969253294,
2869
- "grad_norm": 1.671875,
2870
- "learning_rate": 1.4560023211540598e-11,
2871
- "loss": 0.0676,
2872
- "step": 2045
2873
- },
2874
- {
2875
- "epoch": 5.991215226939971,
2876
- "step": 2046,
2877
- "total_flos": 2.3380471586755123e+17,
2878
- "train_loss": 0.343939907065905,
2879
- "train_runtime": 7450.6058,
2880
- "train_samples_per_second": 8.795,
2881
- "train_steps_per_second": 0.275
2882
  }
2883
  ],
2884
  "logging_steps": 5,
2885
- "max_steps": 2046,
2886
  "num_input_tokens_seen": 0,
2887
  "num_train_epochs": 6,
2888
  "save_steps": 999999,
@@ -2898,7 +931,7 @@
2898
  "attributes": {}
2899
  }
2900
  },
2901
- "total_flos": 2.3380471586755123e+17,
2902
  "train_batch_size": 8,
2903
  "trial_name": null,
2904
  "trial_params": null
 
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
+ "epoch": 5.958236658932715,
5
  "eval_steps": 500,
6
+ "global_step": 642,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
+ "epoch": 0.04640371229698376,
13
+ "grad_norm": 304.0,
14
+ "learning_rate": 1.5384615384615387e-06,
15
+ "loss": 2.4879,
16
  "step": 5
17
  },
18
  {
19
+ "epoch": 0.09280742459396751,
20
+ "grad_norm": 58.5,
21
+ "learning_rate": 3.0769230769230774e-06,
22
+ "loss": 2.2869,
23
  "step": 10
24
  },
25
  {
26
+ "epoch": 0.13921113689095127,
27
+ "grad_norm": 208.0,
28
+ "learning_rate": 4.615384615384616e-06,
29
+ "loss": 1.8801,
30
  "step": 15
31
  },
32
  {
33
+ "epoch": 0.18561484918793503,
34
+ "grad_norm": 7.40625,
35
+ "learning_rate": 6.153846153846155e-06,
36
+ "loss": 1.4544,
37
  "step": 20
38
  },
39
  {
40
+ "epoch": 0.23201856148491878,
41
+ "grad_norm": 5.71875,
42
+ "learning_rate": 7.692307692307694e-06,
43
+ "loss": 1.1734,
44
  "step": 25
45
  },
46
  {
47
+ "epoch": 0.27842227378190254,
48
+ "grad_norm": 4.65625,
49
+ "learning_rate": 9.230769230769232e-06,
50
+ "loss": 1.006,
51
  "step": 30
52
  },
53
  {
54
+ "epoch": 0.3248259860788863,
55
+ "grad_norm": 3.984375,
56
+ "learning_rate": 1.076923076923077e-05,
57
+ "loss": 0.9744,
58
  "step": 35
59
  },
60
  {
61
+ "epoch": 0.37122969837587005,
62
+ "grad_norm": 4.34375,
63
+ "learning_rate": 1.230769230769231e-05,
64
+ "loss": 0.9372,
65
  "step": 40
66
  },
67
  {
68
+ "epoch": 0.4176334106728538,
69
+ "grad_norm": 3.6875,
70
+ "learning_rate": 1.3846153846153847e-05,
71
+ "loss": 0.8942,
72
  "step": 45
73
  },
74
  {
75
+ "epoch": 0.46403712296983757,
76
+ "grad_norm": 3.6875,
77
+ "learning_rate": 1.5384615384615387e-05,
78
+ "loss": 0.8729,
79
  "step": 50
80
  },
81
  {
82
+ "epoch": 0.5104408352668214,
83
+ "grad_norm": 3.8125,
84
+ "learning_rate": 1.6923076923076924e-05,
85
+ "loss": 0.8978,
86
  "step": 55
87
  },
88
  {
89
+ "epoch": 0.5568445475638051,
90
+ "grad_norm": 3.546875,
91
+ "learning_rate": 1.8461538461538465e-05,
92
+ "loss": 0.8524,
93
  "step": 60
94
  },
95
  {
96
+ "epoch": 0.6032482598607889,
97
+ "grad_norm": 3.5625,
98
+ "learning_rate": 2e-05,
99
+ "loss": 0.8231,
100
  "step": 65
101
  },
102
  {
103
+ "epoch": 0.6496519721577726,
104
+ "grad_norm": 3.28125,
105
+ "learning_rate": 1.9996294632312766e-05,
106
+ "loss": 0.8315,
107
  "step": 70
108
  },
109
  {
110
+ "epoch": 0.6960556844547564,
111
+ "grad_norm": 3.65625,
112
+ "learning_rate": 1.9985181275201e-05,
113
+ "loss": 0.8262,
114
  "step": 75
115
  },
116
  {
117
+ "epoch": 0.7424593967517401,
118
+ "grad_norm": 3.109375,
119
+ "learning_rate": 1.9966668164479567e-05,
120
+ "loss": 0.8193,
121
  "step": 80
122
  },
123
  {
124
+ "epoch": 0.7888631090487239,
125
+ "grad_norm": 3.28125,
126
+ "learning_rate": 1.9940769019724926e-05,
127
+ "loss": 0.8732,
128
  "step": 85
129
  },
130
  {
131
+ "epoch": 0.8352668213457076,
132
+ "grad_norm": 3.03125,
133
+ "learning_rate": 1.9907503034107893e-05,
134
+ "loss": 0.8356,
135
  "step": 90
136
  },
137
  {
138
+ "epoch": 0.8816705336426914,
139
+ "grad_norm": 3.0625,
140
+ "learning_rate": 1.9866894860170104e-05,
141
+ "loss": 0.8692,
142
  "step": 95
143
  },
144
  {
145
+ "epoch": 0.9280742459396751,
146
+ "grad_norm": 2.984375,
147
+ "learning_rate": 1.9818974591554668e-05,
148
+ "loss": 0.8007,
149
  "step": 100
150
  },
151
  {
152
+ "epoch": 0.974477958236659,
153
+ "grad_norm": 3.296875,
154
+ "learning_rate": 1.9763777740704572e-05,
155
+ "loss": 0.8366,
156
  "step": 105
157
  },
158
  {
159
+ "epoch": 1.0208816705336428,
160
+ "grad_norm": 2.625,
161
+ "learning_rate": 1.970134521254532e-05,
162
+ "loss": 0.8049,
163
  "step": 110
164
  },
165
  {
166
+ "epoch": 1.0672853828306264,
167
+ "grad_norm": 2.921875,
168
+ "learning_rate": 1.9631723274171412e-05,
169
+ "loss": 0.6126,
170
  "step": 115
171
  },
172
  {
173
+ "epoch": 1.1136890951276102,
174
+ "grad_norm": 2.96875,
175
+ "learning_rate": 1.9554963520559003e-05,
176
+ "loss": 0.6394,
177
  "step": 120
178
  },
179
  {
180
+ "epoch": 1.160092807424594,
181
+ "grad_norm": 3.421875,
182
+ "learning_rate": 1.9471122836330236e-05,
183
+ "loss": 0.6416,
184
  "step": 125
185
  },
186
  {
187
+ "epoch": 1.2064965197215778,
188
+ "grad_norm": 3.40625,
189
+ "learning_rate": 1.9380263353597553e-05,
190
+ "loss": 0.6129,
191
  "step": 130
192
  },
193
  {
194
+ "epoch": 1.2529002320185616,
195
+ "grad_norm": 3.359375,
196
+ "learning_rate": 1.9282452405919235e-05,
197
+ "loss": 0.6533,
198
  "step": 135
199
  },
200
  {
201
+ "epoch": 1.2993039443155452,
202
+ "grad_norm": 3.515625,
203
+ "learning_rate": 1.9177762478400276e-05,
204
+ "loss": 0.6333,
205
  "step": 140
206
  },
207
  {
208
+ "epoch": 1.345707656612529,
209
+ "grad_norm": 2.921875,
210
+ "learning_rate": 1.9066271153975602e-05,
211
+ "loss": 0.6426,
212
  "step": 145
213
  },
214
  {
215
+ "epoch": 1.3921113689095128,
216
+ "grad_norm": 3.421875,
217
+ "learning_rate": 1.8948061055915395e-05,
218
+ "loss": 0.6492,
219
  "step": 150
220
  },
221
  {
222
+ "epoch": 1.4385150812064964,
223
+ "grad_norm": 3.09375,
224
+ "learning_rate": 1.882321978659519e-05,
225
+ "loss": 0.6052,
226
  "step": 155
227
  },
228
  {
229
+ "epoch": 1.4849187935034802,
230
+ "grad_norm": 3.15625,
231
+ "learning_rate": 1.869183986257606e-05,
232
+ "loss": 0.6471,
233
  "step": 160
234
  },
235
  {
236
+ "epoch": 1.531322505800464,
237
+ "grad_norm": 3.15625,
238
+ "learning_rate": 1.8554018646043045e-05,
239
+ "loss": 0.6229,
240
  "step": 165
241
  },
242
  {
243
+ "epoch": 1.5777262180974478,
244
+ "grad_norm": 3.46875,
245
+ "learning_rate": 1.840985827265262e-05,
246
+ "loss": 0.6153,
247
  "step": 170
248
  },
249
  {
250
+ "epoch": 1.6241299303944317,
251
+ "grad_norm": 3.1875,
252
+ "learning_rate": 1.825946557584265e-05,
253
+ "loss": 0.6302,
254
  "step": 175
255
  },
256
  {
257
+ "epoch": 1.6705336426914155,
258
+ "grad_norm": 3.421875,
259
+ "learning_rate": 1.810295200766097e-05,
260
+ "loss": 0.6296,
261
  "step": 180
262
  },
263
  {
264
+ "epoch": 1.716937354988399,
265
+ "grad_norm": 3.328125,
266
+ "learning_rate": 1.794043355617121e-05,
267
+ "loss": 0.6141,
268
  "step": 185
269
  },
270
  {
271
+ "epoch": 1.7633410672853829,
272
  "grad_norm": 3.3125,
273
+ "learning_rate": 1.7772030659497112e-05,
274
+ "loss": 0.6288,
275
  "step": 190
276
  },
277
  {
278
+ "epoch": 1.8097447795823665,
279
+ "grad_norm": 2.984375,
280
+ "learning_rate": 1.7597868116569036e-05,
281
+ "loss": 0.6376,
282
  "step": 195
283
  },
284
  {
285
+ "epoch": 1.8561484918793503,
286
+ "grad_norm": 3.25,
287
+ "learning_rate": 1.7418074994638752e-05,
288
+ "loss": 0.6139,
289
  "step": 200
290
  },
291
  {
292
+ "epoch": 1.902552204176334,
293
+ "grad_norm": 3.609375,
294
+ "learning_rate": 1.7232784533631148e-05,
295
+ "loss": 0.6259,
296
  "step": 205
297
  },
298
  {
299
+ "epoch": 1.948955916473318,
300
+ "grad_norm": 2.9375,
301
+ "learning_rate": 1.7042134047403613e-05,
302
+ "loss": 0.6429,
303
  "step": 210
304
  },
305
  {
306
+ "epoch": 1.9953596287703017,
307
+ "grad_norm": 3.703125,
308
+ "learning_rate": 1.684626482198639e-05,
309
+ "loss": 0.6532,
310
  "step": 215
311
  },
312
  {
313
+ "epoch": 2.0417633410672855,
314
+ "grad_norm": 3.71875,
315
+ "learning_rate": 1.6645322010879242e-05,
316
+ "loss": 0.4533,
317
  "step": 220
318
  },
319
  {
320
+ "epoch": 2.0881670533642693,
321
+ "grad_norm": 4.71875,
322
+ "learning_rate": 1.6439454527482014e-05,
323
+ "loss": 0.3677,
324
  "step": 225
325
  },
326
  {
327
+ "epoch": 2.1345707656612527,
328
+ "grad_norm": 3.703125,
329
+ "learning_rate": 1.6228814934738873e-05,
330
+ "loss": 0.3657,
331
  "step": 230
332
  },
333
  {
334
+ "epoch": 2.1809744779582365,
335
+ "grad_norm": 4.3125,
336
+ "learning_rate": 1.6013559332077945e-05,
337
+ "loss": 0.3359,
338
  "step": 235
339
  },
340
  {
341
+ "epoch": 2.2273781902552203,
342
+ "grad_norm": 5.3125,
343
+ "learning_rate": 1.5793847239730148e-05,
344
+ "loss": 0.3438,
345
  "step": 240
346
  },
347
  {
348
+ "epoch": 2.273781902552204,
349
+ "grad_norm": 4.4375,
350
+ "learning_rate": 1.5569841480512972e-05,
351
+ "loss": 0.3437,
352
  "step": 245
353
  },
354
  {
355
+ "epoch": 2.320185614849188,
356
+ "grad_norm": 4.09375,
357
+ "learning_rate": 1.534170805916681e-05,
358
+ "loss": 0.3474,
359
  "step": 250
360
  },
361
  {
362
+ "epoch": 2.3665893271461718,
363
+ "grad_norm": 4.65625,
364
+ "learning_rate": 1.510961603933324e-05,
365
+ "loss": 0.3529,
366
  "step": 255
367
  },
368
  {
369
+ "epoch": 2.4129930394431556,
370
+ "grad_norm": 3.96875,
371
+ "learning_rate": 1.4873737418266398e-05,
372
+ "loss": 0.3486,
373
  "step": 260
374
  },
375
  {
376
+ "epoch": 2.4593967517401394,
377
+ "grad_norm": 4.15625,
378
+ "learning_rate": 1.4634246999370415e-05,
379
+ "loss": 0.3345,
380
  "step": 265
381
  },
382
  {
383
+ "epoch": 2.505800464037123,
384
+ "grad_norm": 4.125,
385
+ "learning_rate": 1.4391322262657206e-05,
386
+ "loss": 0.3498,
387
  "step": 270
388
  },
389
  {
390
+ "epoch": 2.5522041763341066,
391
+ "grad_norm": 3.71875,
392
+ "learning_rate": 1.4145143233220741e-05,
393
+ "loss": 0.331,
394
  "step": 275
395
  },
396
  {
397
+ "epoch": 2.5986078886310904,
398
+ "grad_norm": 4.28125,
399
+ "learning_rate": 1.3895892347825205e-05,
400
+ "loss": 0.3499,
401
  "step": 280
402
  },
403
  {
404
+ "epoch": 2.645011600928074,
405
+ "grad_norm": 4.78125,
406
+ "learning_rate": 1.3643754319705956e-05,
407
+ "loss": 0.348,
408
  "step": 285
409
  },
410
  {
411
+ "epoch": 2.691415313225058,
412
+ "grad_norm": 4.03125,
413
+ "learning_rate": 1.3388916001683412e-05,
414
+ "loss": 0.3527,
415
  "step": 290
416
  },
417
  {
418
+ "epoch": 2.737819025522042,
419
+ "grad_norm": 4.0625,
420
+ "learning_rate": 1.3131566247691387e-05,
421
+ "loss": 0.3207,
422
  "step": 295
423
  },
424
  {
425
+ "epoch": 2.7842227378190256,
426
+ "grad_norm": 4.25,
427
+ "learning_rate": 1.2871895772822442e-05,
428
+ "loss": 0.3458,
429
  "step": 300
430
  },
431
  {
432
+ "epoch": 2.8306264501160094,
433
+ "grad_norm": 4.0625,
434
+ "learning_rate": 1.261009701199395e-05,
435
+ "loss": 0.3501,
436
  "step": 305
437
  },
438
  {
439
+ "epoch": 2.877030162412993,
440
+ "grad_norm": 3.578125,
441
+ "learning_rate": 1.2346363977339698e-05,
442
+ "loss": 0.3583,
443
  "step": 310
444
  },
445
  {
446
+ "epoch": 2.9234338747099766,
447
+ "grad_norm": 4.03125,
448
+ "learning_rate": 1.208089211443262e-05,
449
+ "loss": 0.3328,
450
  "step": 315
451
  },
452
  {
453
+ "epoch": 2.9698375870069604,
454
+ "grad_norm": 3.890625,
455
+ "learning_rate": 1.1813878157445253e-05,
456
+ "loss": 0.3365,
457
  "step": 320
458
  },
459
  {
460
+ "epoch": 3.0162412993039442,
461
+ "grad_norm": 3.53125,
462
+ "learning_rate": 1.1545519983355255e-05,
463
+ "loss": 0.3156,
464
  "step": 325
465
  },
466
  {
467
+ "epoch": 3.062645011600928,
468
+ "grad_norm": 3.90625,
469
+ "learning_rate": 1.1276016465303989e-05,
470
+ "loss": 0.1769,
471
  "step": 330
472
  },
473
  {
474
+ "epoch": 3.109048723897912,
475
+ "grad_norm": 5.78125,
476
+ "learning_rate": 1.1005567325216946e-05,
477
+ "loss": 0.1737,
478
  "step": 335
479
  },
480
  {
481
+ "epoch": 3.1554524361948957,
482
+ "grad_norm": 4.28125,
483
+ "learning_rate": 1.0734372985795062e-05,
484
+ "loss": 0.165,
485
  "step": 340
486
  },
487
  {
488
+ "epoch": 3.2018561484918795,
489
+ "grad_norm": 3.546875,
490
+ "learning_rate": 1.0462634421986786e-05,
491
+ "loss": 0.1701,
492
  "step": 345
493
  },
494
  {
495
+ "epoch": 3.2482598607888633,
496
+ "grad_norm": 4.46875,
497
+ "learning_rate": 1.0190553012050868e-05,
498
+ "loss": 0.1707,
499
  "step": 350
500
  },
501
  {
502
+ "epoch": 3.2946635730858467,
503
+ "grad_norm": 4.25,
504
+ "learning_rate": 9.918330388320235e-06,
505
+ "loss": 0.1603,
506
  "step": 355
507
  },
508
  {
509
+ "epoch": 3.3410672853828305,
510
+ "grad_norm": 4.0625,
511
+ "learning_rate": 9.646168287777633e-06,
512
+ "loss": 0.1662,
513
  "step": 360
514
  },
515
  {
516
+ "epoch": 3.3874709976798143,
517
+ "grad_norm": 4.46875,
518
+ "learning_rate": 9.374268402553665e-06,
519
+ "loss": 0.159,
520
  "step": 365
521
  },
522
  {
523
+ "epoch": 3.433874709976798,
524
+ "grad_norm": 3.984375,
525
+ "learning_rate": 9.102832230458115e-06,
526
+ "loss": 0.163,
527
  "step": 370
528
  },
529
  {
530
+ "epoch": 3.480278422273782,
531
+ "grad_norm": 4.09375,
532
+ "learning_rate": 8.83206092565522e-06,
533
+ "loss": 0.1591,
534
  "step": 375
535
  },
536
  {
537
+ "epoch": 3.5266821345707657,
538
+ "grad_norm": 4.34375,
539
+ "learning_rate": 8.562155149593673e-06,
540
+ "loss": 0.1706,
541
  "step": 380
542
  },
543
  {
544
+ "epoch": 3.5730858468677495,
545
+ "grad_norm": 4.4375,
546
+ "learning_rate": 8.293314922301715e-06,
547
+ "loss": 0.1595,
548
  "step": 385
549
  },
550
  {
551
+ "epoch": 3.619489559164733,
552
+ "grad_norm": 4.03125,
553
+ "learning_rate": 8.025739474157595e-06,
554
+ "loss": 0.1637,
555
  "step": 390
556
  },
557
  {
558
+ "epoch": 3.6658932714617167,
559
+ "grad_norm": 3.828125,
560
+ "learning_rate": 7.759627098245207e-06,
561
+ "loss": 0.1665,
562
  "step": 395
563
  },
564
  {
565
+ "epoch": 3.7122969837587005,
566
+ "grad_norm": 3.953125,
567
+ "learning_rate": 7.49517500340432e-06,
568
+ "loss": 0.167,
569
  "step": 400
570
  },
571
  {
572
+ "epoch": 3.7587006960556844,
573
  "grad_norm": 3.78125,
574
+ "learning_rate": 7.232579168084344e-06,
575
+ "loss": 0.1533,
576
  "step": 405
577
  },
578
  {
579
+ "epoch": 3.805104408352668,
580
+ "grad_norm": 4.3125,
581
+ "learning_rate": 6.972034195109885e-06,
582
+ "loss": 0.1651,
583
  "step": 410
584
  },
585
  {
586
+ "epoch": 3.851508120649652,
587
+ "grad_norm": 3.640625,
588
+ "learning_rate": 6.713733167465723e-06,
589
+ "loss": 0.1673,
590
  "step": 415
591
  },
592
  {
593
+ "epoch": 3.897911832946636,
594
+ "grad_norm": 4.125,
595
+ "learning_rate": 6.4578675052081395e-06,
596
+ "loss": 0.1576,
597
  "step": 420
598
  },
599
  {
600
+ "epoch": 3.9443155452436196,
601
+ "grad_norm": 3.96875,
602
+ "learning_rate": 6.204626823608584e-06,
603
+ "loss": 0.1688,
604
  "step": 425
605
  },
606
  {
607
+ "epoch": 3.9907192575406034,
608
+ "grad_norm": 3.78125,
609
+ "learning_rate": 5.954198792634782e-06,
610
+ "loss": 0.1571,
611
  "step": 430
612
  },
613
  {
614
+ "epoch": 4.037122969837587,
615
+ "grad_norm": 2.546875,
616
+ "learning_rate": 5.706768997873533e-06,
617
+ "loss": 0.1248,
618
  "step": 435
619
  },
620
  {
621
+ "epoch": 4.083526682134571,
622
+ "grad_norm": 2.1875,
623
+ "learning_rate": 5.462520802998108e-06,
624
+ "loss": 0.0991,
625
  "step": 440
626
  },
627
  {
628
+ "epoch": 4.129930394431555,
629
+ "grad_norm": 3.234375,
630
+ "learning_rate": 5.221635213882295e-06,
631
+ "loss": 0.0951,
632
  "step": 445
633
  },
634
  {
635
+ "epoch": 4.176334106728539,
636
+ "grad_norm": 3.46875,
637
+ "learning_rate": 4.9842907444617415e-06,
638
+ "loss": 0.1023,
639
  "step": 450
640
  },
641
  {
642
+ "epoch": 4.222737819025522,
643
+ "grad_norm": 2.90625,
644
+ "learning_rate": 4.750663284442001e-06,
645
+ "loss": 0.0941,
646
  "step": 455
647
  },
648
  {
649
+ "epoch": 4.269141531322505,
650
+ "grad_norm": 2.984375,
651
+ "learning_rate": 4.52092596895131e-06,
652
+ "loss": 0.0942,
653
  "step": 460
654
  },
655
  {
656
+ "epoch": 4.315545243619489,
657
+ "grad_norm": 2.90625,
658
+ "learning_rate": 4.295249050234738e-06,
659
+ "loss": 0.0941,
660
  "step": 465
661
  },
662
  {
663
+ "epoch": 4.361948955916473,
664
+ "grad_norm": 3.046875,
665
+ "learning_rate": 4.073799771484768e-06,
666
+ "loss": 0.0956,
667
  "step": 470
668
  },
669
  {
670
+ "epoch": 4.408352668213457,
671
+ "grad_norm": 2.578125,
672
+ "learning_rate": 3.8567422429017585e-06,
673
+ "loss": 0.097,
674
  "step": 475
675
  },
676
  {
677
+ "epoch": 4.454756380510441,
678
+ "grad_norm": 2.546875,
679
+ "learning_rate": 3.644237320076256e-06,
680
+ "loss": 0.0963,
681
  "step": 480
682
  },
683
  {
684
+ "epoch": 4.5011600928074245,
685
+ "grad_norm": 2.984375,
686
+ "learning_rate": 3.436442484783138e-06,
687
+ "loss": 0.0993,
688
  "step": 485
689
  },
690
  {
691
+ "epoch": 4.547563805104408,
692
+ "grad_norm": 3.015625,
693
+ "learning_rate": 3.2335117282760563e-06,
694
+ "loss": 0.0954,
695
  "step": 490
696
  },
697
  {
698
+ "epoch": 4.593967517401392,
699
+ "grad_norm": 2.890625,
700
+ "learning_rate": 3.0355954371685948e-06,
701
+ "loss": 0.0967,
702
  "step": 495
703
  },
704
  {
705
+ "epoch": 4.640371229698376,
706
+ "grad_norm": 2.40625,
707
+ "learning_rate": 2.842840281986726e-06,
708
+ "loss": 0.0988,
709
  "step": 500
710
  },
711
  {
712
+ "epoch": 4.68677494199536,
713
+ "grad_norm": 2.78125,
714
+ "learning_rate": 2.6553891084751604e-06,
715
+ "loss": 0.0976,
716
  "step": 505
717
  },
718
  {
719
+ "epoch": 4.7331786542923435,
720
+ "grad_norm": 3.578125,
721
+ "learning_rate": 2.473380831738146e-06,
722
+ "loss": 0.094,
723
  "step": 510
724
  },
725
  {
726
+ "epoch": 4.779582366589327,
727
+ "grad_norm": 2.5625,
728
+ "learning_rate": 2.2969503332931754e-06,
729
+ "loss": 0.0956,
730
  "step": 515
731
  },
732
  {
733
+ "epoch": 4.825986078886311,
734
+ "grad_norm": 3.0,
735
+ "learning_rate": 2.126228361113839e-06,
736
+ "loss": 0.0988,
737
  "step": 520
738
  },
739
  {
740
+ "epoch": 4.872389791183295,
741
+ "grad_norm": 2.703125,
742
+ "learning_rate": 1.9613414327359824e-06,
743
+ "loss": 0.098,
744
  "step": 525
745
  },
746
  {
747
+ "epoch": 4.918793503480279,
748
+ "grad_norm": 2.71875,
749
+ "learning_rate": 1.8024117414989007e-06,
750
+ "loss": 0.0968,
751
  "step": 530
752
  },
753
  {
754
+ "epoch": 4.965197215777263,
755
+ "grad_norm": 2.75,
756
+ "learning_rate": 1.649557065991081e-06,
757
+ "loss": 0.0938,
758
  "step": 535
759
  },
760
  {
761
+ "epoch": 5.0116009280742455,
762
+ "grad_norm": 1.75,
763
+ "learning_rate": 1.5028906827676148e-06,
764
+ "loss": 0.0965,
765
  "step": 540
766
  },
767
  {
768
+ "epoch": 5.058004640371229,
769
+ "grad_norm": 1.84375,
770
+ "learning_rate": 1.3625212824039468e-06,
771
+ "loss": 0.0829,
772
  "step": 545
773
  },
774
  {
775
+ "epoch": 5.104408352668213,
776
+ "grad_norm": 2.015625,
777
+ "learning_rate": 1.228552888948149e-06,
778
+ "loss": 0.0819,
779
  "step": 550
780
  },
781
  {
782
+ "epoch": 5.150812064965197,
783
+ "grad_norm": 2.078125,
784
+ "learning_rate": 1.1010847828314708e-06,
785
+ "loss": 0.082,
786
  "step": 555
787
  },
788
  {
789
+ "epoch": 5.197215777262181,
790
+ "grad_norm": 2.109375,
791
+ "learning_rate": 9.80211427294222e-07,
792
+ "loss": 0.0849,
793
  "step": 560
794
  },
795
  {
796
+ "epoch": 5.243619489559165,
797
+ "grad_norm": 1.875,
798
+ "learning_rate": 8.660223983815708e-07,
799
+ "loss": 0.0807,
800
  "step": 565
801
  },
802
  {
803
+ "epoch": 5.290023201856148,
804
+ "grad_norm": 2.359375,
805
+ "learning_rate": 7.586023185611136e-07,
806
+ "loss": 0.0866,
807
  "step": 570
808
  },
809
  {
810
+ "epoch": 5.336426914153132,
811
+ "grad_norm": 2.078125,
812
+ "learning_rate": 6.580307940113972e-07,
813
+ "loss": 0.0844,
814
  "step": 575
815
  },
816
  {
817
+ "epoch": 5.382830626450116,
818
+ "grad_norm": 1.953125,
819
+ "learning_rate": 5.643823556278849e-07,
820
+ "loss": 0.0846,
821
  "step": 580
822
  },
823
  {
824
+ "epoch": 5.4292343387471,
825
+ "grad_norm": 2.15625,
826
+ "learning_rate": 4.777264037900841e-07,
827
+ "loss": 0.0841,
828
  "step": 585
829
  },
830
  {
831
+ "epoch": 5.475638051044084,
832
+ "grad_norm": 1.640625,
833
+ "learning_rate": 3.981271569307654e-07,
834
+ "loss": 0.0816,
835
  "step": 590
836
  },
837
  {
838
+ "epoch": 5.522041763341067,
839
+ "grad_norm": 1.7578125,
840
+ "learning_rate": 3.2564360394537696e-07,
841
+ "loss": 0.0845,
842
  "step": 595
843
  },
844
  {
845
+ "epoch": 5.568445475638051,
846
+ "grad_norm": 2.109375,
847
+ "learning_rate": 2.6032946047693794e-07,
848
+ "loss": 0.0823,
849
  "step": 600
850
  },
851
  {
852
+ "epoch": 5.614849187935035,
853
+ "grad_norm": 2.03125,
854
+ "learning_rate": 2.02233129108792e-07,
855
+ "loss": 0.0829,
856
  "step": 605
857
  },
858
  {
859
+ "epoch": 5.661252900232019,
860
+ "grad_norm": 2.046875,
861
+ "learning_rate": 1.5139766349474004e-07,
862
+ "loss": 0.0829,
863
  "step": 610
864
  },
865
  {
866
+ "epoch": 5.707656612529003,
867
+ "grad_norm": 2.078125,
868
+ "learning_rate": 1.0786073645311035e-07,
869
+ "loss": 0.085,
870
  "step": 615
871
  },
872
  {
873
+ "epoch": 5.754060324825986,
874
+ "grad_norm": 2.078125,
875
+ "learning_rate": 7.165461204843738e-08,
876
+ "loss": 0.0842,
877
  "step": 620
878
  },
879
  {
880
+ "epoch": 5.800464037122969,
881
+ "grad_norm": 2.265625,
882
+ "learning_rate": 4.2806121681409076e-08,
883
+ "loss": 0.0809,
884
  "step": 625
885
  },
886
  {
887
+ "epoch": 5.846867749419953,
888
+ "grad_norm": 1.90625,
889
+ "learning_rate": 2.1336644204834613e-08,
890
+ "loss": 0.08,
891
  "step": 630
892
  },
893
  {
894
+ "epoch": 5.893271461716937,
895
+ "grad_norm": 2.390625,
896
+ "learning_rate": 7.262090080331075e-09,
897
+ "loss": 0.0871,
898
  "step": 635
899
  },
900
  {
901
+ "epoch": 5.939675174013921,
902
+ "grad_norm": 1.921875,
903
+ "learning_rate": 5.92889587515133e-10,
904
+ "loss": 0.0866,
905
  "step": 640
906
  },
907
  {
908
+ "epoch": 5.958236658932715,
909
+ "step": 642,
910
+ "total_flos": 6.526533872789914e+16,
911
+ "train_loss": 0.40725265422435564,
912
+ "train_runtime": 2315.0788,
913
+ "train_samples_per_second": 8.936,
914
+ "train_steps_per_second": 0.277
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
915
  }
916
  ],
917
  "logging_steps": 5,
918
+ "max_steps": 642,
919
  "num_input_tokens_seen": 0,
920
  "num_train_epochs": 6,
921
  "save_steps": 999999,
 
931
  "attributes": {}
932
  }
933
  },
934
+ "total_flos": 6.526533872789914e+16,
935
  "train_batch_size": 8,
936
  "trial_name": null,
937
  "trial_params": null