pszemraj commited on
Commit
ee1a1f1
1 Parent(s): 2245df2

End of training

Browse files
Files changed (6) hide show
  1. README.md +2 -2
  2. all_results.json +16 -0
  3. eval_results.json +10 -0
  4. predict_results.txt +1254 -0
  5. train_results.json +10 -0
  6. trainer_state.json +1597 -0
README.md CHANGED
@@ -17,9 +17,9 @@ should probably proofread and complete it, then remove this comment. -->
17
 
18
  This model is a fine-tuned version of [pszemraj/MiniLMv2-L6-H384_R-fineweb-100k](https://huggingface.co/pszemraj/MiniLMv2-L6-H384_R-fineweb-100k) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
- - Loss: 0.0185
21
  - Accuracy: 0.996
22
- - Num Input Tokens Seen: 57341952
23
 
24
  ## Model description
25
 
 
17
 
18
  This model is a fine-tuned version of [pszemraj/MiniLMv2-L6-H384_R-fineweb-100k](https://huggingface.co/pszemraj/MiniLMv2-L6-H384_R-fineweb-100k) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 0.0162
21
  - Accuracy: 0.996
22
+ - Num Input Tokens Seen: 61536256
23
 
24
  ## Model description
25
 
all_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.9984038308060654,
3
+ "eval_accuracy": 0.996,
4
+ "eval_loss": 0.016208015382289886,
5
+ "eval_runtime": 1.136,
6
+ "eval_samples": 500,
7
+ "eval_samples_per_second": 440.122,
8
+ "eval_steps_per_second": 55.455,
9
+ "num_input_tokens_seen": 61536256,
10
+ "total_flos": 3986132331896832.0,
11
+ "train_loss": 0.0478181641492338,
12
+ "train_runtime": 541.4691,
13
+ "train_samples": 60140,
14
+ "train_samples_per_second": 222.136,
15
+ "train_steps_per_second": 3.468
16
+ }
eval_results.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.9984038308060654,
3
+ "eval_accuracy": 0.996,
4
+ "eval_loss": 0.016208015382289886,
5
+ "eval_runtime": 1.136,
6
+ "eval_samples": 500,
7
+ "eval_samples_per_second": 440.122,
8
+ "eval_steps_per_second": 55.455,
9
+ "num_input_tokens_seen": 61536256
10
+ }
predict_results.txt ADDED
@@ -0,0 +1,1254 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ index prediction
2
+ 0 clean
3
+ 1 noisy
4
+ 2 clean
5
+ 3 clean
6
+ 4 clean
7
+ 5 noisy
8
+ 6 clean
9
+ 7 noisy
10
+ 8 noisy
11
+ 9 noisy
12
+ 10 noisy
13
+ 11 clean
14
+ 12 noisy
15
+ 13 clean
16
+ 14 clean
17
+ 15 clean
18
+ 16 clean
19
+ 17 noisy
20
+ 18 noisy
21
+ 19 noisy
22
+ 20 noisy
23
+ 21 clean
24
+ 22 clean
25
+ 23 noisy
26
+ 24 noisy
27
+ 25 noisy
28
+ 26 noisy
29
+ 27 noisy
30
+ 28 clean
31
+ 29 noisy
32
+ 30 clean
33
+ 31 clean
34
+ 32 noisy
35
+ 33 clean
36
+ 34 noisy
37
+ 35 clean
38
+ 36 clean
39
+ 37 clean
40
+ 38 clean
41
+ 39 clean
42
+ 40 noisy
43
+ 41 clean
44
+ 42 noisy
45
+ 43 clean
46
+ 44 noisy
47
+ 45 clean
48
+ 46 noisy
49
+ 47 clean
50
+ 48 noisy
51
+ 49 clean
52
+ 50 clean
53
+ 51 noisy
54
+ 52 noisy
55
+ 53 noisy
56
+ 54 noisy
57
+ 55 noisy
58
+ 56 noisy
59
+ 57 noisy
60
+ 58 clean
61
+ 59 clean
62
+ 60 noisy
63
+ 61 clean
64
+ 62 clean
65
+ 63 clean
66
+ 64 noisy
67
+ 65 noisy
68
+ 66 clean
69
+ 67 clean
70
+ 68 clean
71
+ 69 noisy
72
+ 70 clean
73
+ 71 noisy
74
+ 72 noisy
75
+ 73 clean
76
+ 74 clean
77
+ 75 clean
78
+ 76 clean
79
+ 77 noisy
80
+ 78 noisy
81
+ 79 noisy
82
+ 80 clean
83
+ 81 noisy
84
+ 82 noisy
85
+ 83 clean
86
+ 84 noisy
87
+ 85 noisy
88
+ 86 clean
89
+ 87 clean
90
+ 88 noisy
91
+ 89 clean
92
+ 90 clean
93
+ 91 clean
94
+ 92 clean
95
+ 93 noisy
96
+ 94 noisy
97
+ 95 clean
98
+ 96 clean
99
+ 97 clean
100
+ 98 clean
101
+ 99 noisy
102
+ 100 noisy
103
+ 101 noisy
104
+ 102 clean
105
+ 103 noisy
106
+ 104 noisy
107
+ 105 noisy
108
+ 106 clean
109
+ 107 noisy
110
+ 108 clean
111
+ 109 clean
112
+ 110 noisy
113
+ 111 noisy
114
+ 112 noisy
115
+ 113 noisy
116
+ 114 clean
117
+ 115 noisy
118
+ 116 noisy
119
+ 117 clean
120
+ 118 clean
121
+ 119 clean
122
+ 120 noisy
123
+ 121 noisy
124
+ 122 noisy
125
+ 123 clean
126
+ 124 clean
127
+ 125 clean
128
+ 126 clean
129
+ 127 noisy
130
+ 128 clean
131
+ 129 noisy
132
+ 130 clean
133
+ 131 noisy
134
+ 132 clean
135
+ 133 clean
136
+ 134 clean
137
+ 135 noisy
138
+ 136 clean
139
+ 137 clean
140
+ 138 noisy
141
+ 139 noisy
142
+ 140 noisy
143
+ 141 noisy
144
+ 142 noisy
145
+ 143 clean
146
+ 144 noisy
147
+ 145 noisy
148
+ 146 noisy
149
+ 147 clean
150
+ 148 noisy
151
+ 149 noisy
152
+ 150 noisy
153
+ 151 noisy
154
+ 152 noisy
155
+ 153 noisy
156
+ 154 clean
157
+ 155 noisy
158
+ 156 noisy
159
+ 157 noisy
160
+ 158 clean
161
+ 159 clean
162
+ 160 clean
163
+ 161 noisy
164
+ 162 clean
165
+ 163 clean
166
+ 164 noisy
167
+ 165 clean
168
+ 166 noisy
169
+ 167 noisy
170
+ 168 clean
171
+ 169 noisy
172
+ 170 noisy
173
+ 171 noisy
174
+ 172 noisy
175
+ 173 clean
176
+ 174 clean
177
+ 175 noisy
178
+ 176 clean
179
+ 177 clean
180
+ 178 clean
181
+ 179 noisy
182
+ 180 noisy
183
+ 181 clean
184
+ 182 noisy
185
+ 183 noisy
186
+ 184 noisy
187
+ 185 noisy
188
+ 186 clean
189
+ 187 clean
190
+ 188 noisy
191
+ 189 noisy
192
+ 190 clean
193
+ 191 noisy
194
+ 192 clean
195
+ 193 clean
196
+ 194 noisy
197
+ 195 clean
198
+ 196 clean
199
+ 197 noisy
200
+ 198 noisy
201
+ 199 clean
202
+ 200 noisy
203
+ 201 clean
204
+ 202 noisy
205
+ 203 noisy
206
+ 204 clean
207
+ 205 clean
208
+ 206 clean
209
+ 207 noisy
210
+ 208 noisy
211
+ 209 noisy
212
+ 210 clean
213
+ 211 clean
214
+ 212 clean
215
+ 213 clean
216
+ 214 noisy
217
+ 215 clean
218
+ 216 noisy
219
+ 217 noisy
220
+ 218 noisy
221
+ 219 noisy
222
+ 220 clean
223
+ 221 noisy
224
+ 222 clean
225
+ 223 clean
226
+ 224 clean
227
+ 225 noisy
228
+ 226 noisy
229
+ 227 clean
230
+ 228 noisy
231
+ 229 clean
232
+ 230 noisy
233
+ 231 noisy
234
+ 232 clean
235
+ 233 noisy
236
+ 234 clean
237
+ 235 noisy
238
+ 236 noisy
239
+ 237 clean
240
+ 238 noisy
241
+ 239 clean
242
+ 240 noisy
243
+ 241 clean
244
+ 242 noisy
245
+ 243 clean
246
+ 244 noisy
247
+ 245 noisy
248
+ 246 clean
249
+ 247 clean
250
+ 248 clean
251
+ 249 noisy
252
+ 250 noisy
253
+ 251 clean
254
+ 252 clean
255
+ 253 noisy
256
+ 254 noisy
257
+ 255 clean
258
+ 256 noisy
259
+ 257 clean
260
+ 258 noisy
261
+ 259 clean
262
+ 260 clean
263
+ 261 clean
264
+ 262 noisy
265
+ 263 noisy
266
+ 264 noisy
267
+ 265 noisy
268
+ 266 noisy
269
+ 267 clean
270
+ 268 noisy
271
+ 269 noisy
272
+ 270 clean
273
+ 271 clean
274
+ 272 noisy
275
+ 273 noisy
276
+ 274 clean
277
+ 275 noisy
278
+ 276 noisy
279
+ 277 noisy
280
+ 278 noisy
281
+ 279 noisy
282
+ 280 noisy
283
+ 281 clean
284
+ 282 noisy
285
+ 283 clean
286
+ 284 clean
287
+ 285 clean
288
+ 286 clean
289
+ 287 clean
290
+ 288 noisy
291
+ 289 noisy
292
+ 290 clean
293
+ 291 noisy
294
+ 292 clean
295
+ 293 noisy
296
+ 294 clean
297
+ 295 clean
298
+ 296 clean
299
+ 297 noisy
300
+ 298 noisy
301
+ 299 clean
302
+ 300 clean
303
+ 301 noisy
304
+ 302 noisy
305
+ 303 clean
306
+ 304 noisy
307
+ 305 noisy
308
+ 306 noisy
309
+ 307 noisy
310
+ 308 noisy
311
+ 309 noisy
312
+ 310 clean
313
+ 311 noisy
314
+ 312 clean
315
+ 313 clean
316
+ 314 clean
317
+ 315 noisy
318
+ 316 noisy
319
+ 317 clean
320
+ 318 noisy
321
+ 319 clean
322
+ 320 noisy
323
+ 321 clean
324
+ 322 clean
325
+ 323 clean
326
+ 324 clean
327
+ 325 noisy
328
+ 326 clean
329
+ 327 clean
330
+ 328 clean
331
+ 329 clean
332
+ 330 noisy
333
+ 331 noisy
334
+ 332 clean
335
+ 333 noisy
336
+ 334 noisy
337
+ 335 clean
338
+ 336 noisy
339
+ 337 clean
340
+ 338 clean
341
+ 339 clean
342
+ 340 noisy
343
+ 341 noisy
344
+ 342 noisy
345
+ 343 noisy
346
+ 344 noisy
347
+ 345 noisy
348
+ 346 clean
349
+ 347 clean
350
+ 348 clean
351
+ 349 clean
352
+ 350 clean
353
+ 351 clean
354
+ 352 noisy
355
+ 353 noisy
356
+ 354 noisy
357
+ 355 clean
358
+ 356 noisy
359
+ 357 noisy
360
+ 358 noisy
361
+ 359 noisy
362
+ 360 noisy
363
+ 361 clean
364
+ 362 clean
365
+ 363 clean
366
+ 364 clean
367
+ 365 noisy
368
+ 366 noisy
369
+ 367 clean
370
+ 368 clean
371
+ 369 clean
372
+ 370 noisy
373
+ 371 clean
374
+ 372 noisy
375
+ 373 clean
376
+ 374 noisy
377
+ 375 clean
378
+ 376 noisy
379
+ 377 clean
380
+ 378 noisy
381
+ 379 noisy
382
+ 380 noisy
383
+ 381 clean
384
+ 382 clean
385
+ 383 clean
386
+ 384 clean
387
+ 385 noisy
388
+ 386 noisy
389
+ 387 clean
390
+ 388 noisy
391
+ 389 noisy
392
+ 390 clean
393
+ 391 noisy
394
+ 392 clean
395
+ 393 clean
396
+ 394 noisy
397
+ 395 clean
398
+ 396 noisy
399
+ 397 noisy
400
+ 398 noisy
401
+ 399 clean
402
+ 400 clean
403
+ 401 clean
404
+ 402 noisy
405
+ 403 noisy
406
+ 404 noisy
407
+ 405 noisy
408
+ 406 noisy
409
+ 407 noisy
410
+ 408 clean
411
+ 409 clean
412
+ 410 noisy
413
+ 411 clean
414
+ 412 clean
415
+ 413 noisy
416
+ 414 clean
417
+ 415 clean
418
+ 416 clean
419
+ 417 noisy
420
+ 418 noisy
421
+ 419 clean
422
+ 420 clean
423
+ 421 clean
424
+ 422 clean
425
+ 423 noisy
426
+ 424 noisy
427
+ 425 clean
428
+ 426 noisy
429
+ 427 noisy
430
+ 428 noisy
431
+ 429 noisy
432
+ 430 noisy
433
+ 431 clean
434
+ 432 noisy
435
+ 433 noisy
436
+ 434 clean
437
+ 435 noisy
438
+ 436 clean
439
+ 437 clean
440
+ 438 noisy
441
+ 439 clean
442
+ 440 clean
443
+ 441 clean
444
+ 442 noisy
445
+ 443 noisy
446
+ 444 clean
447
+ 445 clean
448
+ 446 noisy
449
+ 447 noisy
450
+ 448 clean
451
+ 449 clean
452
+ 450 noisy
453
+ 451 noisy
454
+ 452 clean
455
+ 453 noisy
456
+ 454 noisy
457
+ 455 noisy
458
+ 456 noisy
459
+ 457 noisy
460
+ 458 noisy
461
+ 459 clean
462
+ 460 clean
463
+ 461 clean
464
+ 462 noisy
465
+ 463 noisy
466
+ 464 clean
467
+ 465 noisy
468
+ 466 clean
469
+ 467 clean
470
+ 468 noisy
471
+ 469 clean
472
+ 470 noisy
473
+ 471 clean
474
+ 472 clean
475
+ 473 clean
476
+ 474 noisy
477
+ 475 clean
478
+ 476 clean
479
+ 477 noisy
480
+ 478 clean
481
+ 479 clean
482
+ 480 clean
483
+ 481 noisy
484
+ 482 noisy
485
+ 483 noisy
486
+ 484 clean
487
+ 485 clean
488
+ 486 noisy
489
+ 487 clean
490
+ 488 clean
491
+ 489 noisy
492
+ 490 noisy
493
+ 491 clean
494
+ 492 clean
495
+ 493 clean
496
+ 494 clean
497
+ 495 noisy
498
+ 496 clean
499
+ 497 clean
500
+ 498 clean
501
+ 499 noisy
502
+ 500 clean
503
+ 501 noisy
504
+ 502 noisy
505
+ 503 clean
506
+ 504 clean
507
+ 505 clean
508
+ 506 clean
509
+ 507 clean
510
+ 508 noisy
511
+ 509 clean
512
+ 510 clean
513
+ 511 clean
514
+ 512 clean
515
+ 513 noisy
516
+ 514 noisy
517
+ 515 noisy
518
+ 516 noisy
519
+ 517 clean
520
+ 518 noisy
521
+ 519 noisy
522
+ 520 noisy
523
+ 521 noisy
524
+ 522 noisy
525
+ 523 clean
526
+ 524 noisy
527
+ 525 clean
528
+ 526 clean
529
+ 527 noisy
530
+ 528 clean
531
+ 529 noisy
532
+ 530 clean
533
+ 531 noisy
534
+ 532 noisy
535
+ 533 noisy
536
+ 534 clean
537
+ 535 noisy
538
+ 536 noisy
539
+ 537 noisy
540
+ 538 clean
541
+ 539 noisy
542
+ 540 noisy
543
+ 541 clean
544
+ 542 noisy
545
+ 543 clean
546
+ 544 noisy
547
+ 545 clean
548
+ 546 noisy
549
+ 547 clean
550
+ 548 noisy
551
+ 549 clean
552
+ 550 clean
553
+ 551 noisy
554
+ 552 clean
555
+ 553 noisy
556
+ 554 noisy
557
+ 555 noisy
558
+ 556 noisy
559
+ 557 noisy
560
+ 558 clean
561
+ 559 noisy
562
+ 560 noisy
563
+ 561 noisy
564
+ 562 noisy
565
+ 563 noisy
566
+ 564 noisy
567
+ 565 noisy
568
+ 566 clean
569
+ 567 noisy
570
+ 568 clean
571
+ 569 clean
572
+ 570 noisy
573
+ 571 clean
574
+ 572 noisy
575
+ 573 noisy
576
+ 574 clean
577
+ 575 clean
578
+ 576 clean
579
+ 577 clean
580
+ 578 clean
581
+ 579 noisy
582
+ 580 clean
583
+ 581 clean
584
+ 582 noisy
585
+ 583 clean
586
+ 584 noisy
587
+ 585 noisy
588
+ 586 clean
589
+ 587 noisy
590
+ 588 clean
591
+ 589 noisy
592
+ 590 clean
593
+ 591 clean
594
+ 592 noisy
595
+ 593 noisy
596
+ 594 noisy
597
+ 595 noisy
598
+ 596 clean
599
+ 597 noisy
600
+ 598 clean
601
+ 599 noisy
602
+ 600 clean
603
+ 601 clean
604
+ 602 clean
605
+ 603 clean
606
+ 604 noisy
607
+ 605 noisy
608
+ 606 clean
609
+ 607 noisy
610
+ 608 noisy
611
+ 609 noisy
612
+ 610 noisy
613
+ 611 clean
614
+ 612 noisy
615
+ 613 clean
616
+ 614 noisy
617
+ 615 clean
618
+ 616 noisy
619
+ 617 clean
620
+ 618 clean
621
+ 619 clean
622
+ 620 noisy
623
+ 621 clean
624
+ 622 clean
625
+ 623 noisy
626
+ 624 noisy
627
+ 625 clean
628
+ 626 clean
629
+ 627 noisy
630
+ 628 clean
631
+ 629 noisy
632
+ 630 noisy
633
+ 631 clean
634
+ 632 noisy
635
+ 633 noisy
636
+ 634 clean
637
+ 635 clean
638
+ 636 clean
639
+ 637 clean
640
+ 638 noisy
641
+ 639 clean
642
+ 640 noisy
643
+ 641 noisy
644
+ 642 noisy
645
+ 643 clean
646
+ 644 clean
647
+ 645 clean
648
+ 646 noisy
649
+ 647 noisy
650
+ 648 clean
651
+ 649 clean
652
+ 650 noisy
653
+ 651 noisy
654
+ 652 noisy
655
+ 653 noisy
656
+ 654 noisy
657
+ 655 clean
658
+ 656 noisy
659
+ 657 clean
660
+ 658 clean
661
+ 659 clean
662
+ 660 clean
663
+ 661 clean
664
+ 662 noisy
665
+ 663 clean
666
+ 664 noisy
667
+ 665 noisy
668
+ 666 noisy
669
+ 667 clean
670
+ 668 clean
671
+ 669 noisy
672
+ 670 noisy
673
+ 671 noisy
674
+ 672 noisy
675
+ 673 noisy
676
+ 674 noisy
677
+ 675 noisy
678
+ 676 noisy
679
+ 677 clean
680
+ 678 clean
681
+ 679 noisy
682
+ 680 clean
683
+ 681 clean
684
+ 682 noisy
685
+ 683 clean
686
+ 684 clean
687
+ 685 clean
688
+ 686 noisy
689
+ 687 clean
690
+ 688 clean
691
+ 689 noisy
692
+ 690 clean
693
+ 691 noisy
694
+ 692 noisy
695
+ 693 noisy
696
+ 694 clean
697
+ 695 clean
698
+ 696 noisy
699
+ 697 noisy
700
+ 698 clean
701
+ 699 noisy
702
+ 700 noisy
703
+ 701 noisy
704
+ 702 clean
705
+ 703 clean
706
+ 704 noisy
707
+ 705 noisy
708
+ 706 noisy
709
+ 707 noisy
710
+ 708 noisy
711
+ 709 clean
712
+ 710 clean
713
+ 711 noisy
714
+ 712 noisy
715
+ 713 clean
716
+ 714 noisy
717
+ 715 clean
718
+ 716 noisy
719
+ 717 clean
720
+ 718 noisy
721
+ 719 clean
722
+ 720 clean
723
+ 721 noisy
724
+ 722 noisy
725
+ 723 noisy
726
+ 724 clean
727
+ 725 clean
728
+ 726 noisy
729
+ 727 noisy
730
+ 728 clean
731
+ 729 clean
732
+ 730 noisy
733
+ 731 noisy
734
+ 732 noisy
735
+ 733 noisy
736
+ 734 noisy
737
+ 735 clean
738
+ 736 clean
739
+ 737 noisy
740
+ 738 noisy
741
+ 739 clean
742
+ 740 clean
743
+ 741 clean
744
+ 742 noisy
745
+ 743 clean
746
+ 744 clean
747
+ 745 clean
748
+ 746 noisy
749
+ 747 clean
750
+ 748 noisy
751
+ 749 clean
752
+ 750 clean
753
+ 751 clean
754
+ 752 noisy
755
+ 753 clean
756
+ 754 noisy
757
+ 755 clean
758
+ 756 clean
759
+ 757 noisy
760
+ 758 clean
761
+ 759 noisy
762
+ 760 clean
763
+ 761 clean
764
+ 762 noisy
765
+ 763 clean
766
+ 764 noisy
767
+ 765 clean
768
+ 766 clean
769
+ 767 clean
770
+ 768 noisy
771
+ 769 clean
772
+ 770 noisy
773
+ 771 clean
774
+ 772 noisy
775
+ 773 clean
776
+ 774 clean
777
+ 775 clean
778
+ 776 noisy
779
+ 777 clean
780
+ 778 clean
781
+ 779 noisy
782
+ 780 clean
783
+ 781 clean
784
+ 782 clean
785
+ 783 noisy
786
+ 784 noisy
787
+ 785 clean
788
+ 786 noisy
789
+ 787 noisy
790
+ 788 noisy
791
+ 789 clean
792
+ 790 clean
793
+ 791 noisy
794
+ 792 noisy
795
+ 793 clean
796
+ 794 clean
797
+ 795 clean
798
+ 796 noisy
799
+ 797 clean
800
+ 798 clean
801
+ 799 noisy
802
+ 800 noisy
803
+ 801 clean
804
+ 802 clean
805
+ 803 clean
806
+ 804 noisy
807
+ 805 noisy
808
+ 806 noisy
809
+ 807 clean
810
+ 808 noisy
811
+ 809 noisy
812
+ 810 noisy
813
+ 811 noisy
814
+ 812 noisy
815
+ 813 noisy
816
+ 814 noisy
817
+ 815 clean
818
+ 816 noisy
819
+ 817 clean
820
+ 818 noisy
821
+ 819 clean
822
+ 820 clean
823
+ 821 noisy
824
+ 822 clean
825
+ 823 noisy
826
+ 824 clean
827
+ 825 noisy
828
+ 826 clean
829
+ 827 clean
830
+ 828 clean
831
+ 829 noisy
832
+ 830 clean
833
+ 831 clean
834
+ 832 clean
835
+ 833 noisy
836
+ 834 noisy
837
+ 835 noisy
838
+ 836 clean
839
+ 837 noisy
840
+ 838 noisy
841
+ 839 noisy
842
+ 840 noisy
843
+ 841 noisy
844
+ 842 clean
845
+ 843 noisy
846
+ 844 noisy
847
+ 845 clean
848
+ 846 clean
849
+ 847 noisy
850
+ 848 clean
851
+ 849 noisy
852
+ 850 noisy
853
+ 851 noisy
854
+ 852 clean
855
+ 853 noisy
856
+ 854 noisy
857
+ 855 clean
858
+ 856 clean
859
+ 857 noisy
860
+ 858 clean
861
+ 859 clean
862
+ 860 noisy
863
+ 861 noisy
864
+ 862 noisy
865
+ 863 noisy
866
+ 864 noisy
867
+ 865 noisy
868
+ 866 clean
869
+ 867 clean
870
+ 868 noisy
871
+ 869 clean
872
+ 870 clean
873
+ 871 clean
874
+ 872 noisy
875
+ 873 noisy
876
+ 874 noisy
877
+ 875 clean
878
+ 876 clean
879
+ 877 clean
880
+ 878 noisy
881
+ 879 clean
882
+ 880 noisy
883
+ 881 noisy
884
+ 882 clean
885
+ 883 noisy
886
+ 884 clean
887
+ 885 clean
888
+ 886 clean
889
+ 887 clean
890
+ 888 noisy
891
+ 889 clean
892
+ 890 noisy
893
+ 891 noisy
894
+ 892 clean
895
+ 893 noisy
896
+ 894 clean
897
+ 895 clean
898
+ 896 clean
899
+ 897 noisy
900
+ 898 noisy
901
+ 899 clean
902
+ 900 clean
903
+ 901 clean
904
+ 902 noisy
905
+ 903 noisy
906
+ 904 noisy
907
+ 905 clean
908
+ 906 noisy
909
+ 907 noisy
910
+ 908 clean
911
+ 909 clean
912
+ 910 clean
913
+ 911 clean
914
+ 912 noisy
915
+ 913 clean
916
+ 914 noisy
917
+ 915 clean
918
+ 916 noisy
919
+ 917 clean
920
+ 918 noisy
921
+ 919 noisy
922
+ 920 noisy
923
+ 921 noisy
924
+ 922 noisy
925
+ 923 noisy
926
+ 924 clean
927
+ 925 noisy
928
+ 926 clean
929
+ 927 clean
930
+ 928 noisy
931
+ 929 noisy
932
+ 930 noisy
933
+ 931 noisy
934
+ 932 clean
935
+ 933 clean
936
+ 934 noisy
937
+ 935 noisy
938
+ 936 noisy
939
+ 937 clean
940
+ 938 clean
941
+ 939 noisy
942
+ 940 noisy
943
+ 941 noisy
944
+ 942 clean
945
+ 943 clean
946
+ 944 clean
947
+ 945 noisy
948
+ 946 noisy
949
+ 947 clean
950
+ 948 clean
951
+ 949 clean
952
+ 950 clean
953
+ 951 noisy
954
+ 952 clean
955
+ 953 noisy
956
+ 954 noisy
957
+ 955 clean
958
+ 956 noisy
959
+ 957 noisy
960
+ 958 clean
961
+ 959 noisy
962
+ 960 clean
963
+ 961 noisy
964
+ 962 clean
965
+ 963 clean
966
+ 964 clean
967
+ 965 clean
968
+ 966 noisy
969
+ 967 clean
970
+ 968 noisy
971
+ 969 clean
972
+ 970 noisy
973
+ 971 clean
974
+ 972 noisy
975
+ 973 clean
976
+ 974 clean
977
+ 975 clean
978
+ 976 clean
979
+ 977 clean
980
+ 978 noisy
981
+ 979 noisy
982
+ 980 clean
983
+ 981 clean
984
+ 982 noisy
985
+ 983 clean
986
+ 984 clean
987
+ 985 noisy
988
+ 986 clean
989
+ 987 clean
990
+ 988 clean
991
+ 989 clean
992
+ 990 clean
993
+ 991 clean
994
+ 992 noisy
995
+ 993 noisy
996
+ 994 clean
997
+ 995 noisy
998
+ 996 clean
999
+ 997 clean
1000
+ 998 clean
1001
+ 999 clean
1002
+ 1000 noisy
1003
+ 1001 clean
1004
+ 1002 clean
1005
+ 1003 clean
1006
+ 1004 clean
1007
+ 1005 clean
1008
+ 1006 clean
1009
+ 1007 noisy
1010
+ 1008 clean
1011
+ 1009 clean
1012
+ 1010 noisy
1013
+ 1011 noisy
1014
+ 1012 clean
1015
+ 1013 clean
1016
+ 1014 noisy
1017
+ 1015 clean
1018
+ 1016 clean
1019
+ 1017 noisy
1020
+ 1018 noisy
1021
+ 1019 clean
1022
+ 1020 clean
1023
+ 1021 noisy
1024
+ 1022 clean
1025
+ 1023 clean
1026
+ 1024 noisy
1027
+ 1025 clean
1028
+ 1026 clean
1029
+ 1027 clean
1030
+ 1028 noisy
1031
+ 1029 noisy
1032
+ 1030 clean
1033
+ 1031 clean
1034
+ 1032 clean
1035
+ 1033 clean
1036
+ 1034 noisy
1037
+ 1035 noisy
1038
+ 1036 noisy
1039
+ 1037 noisy
1040
+ 1038 noisy
1041
+ 1039 clean
1042
+ 1040 clean
1043
+ 1041 noisy
1044
+ 1042 noisy
1045
+ 1043 noisy
1046
+ 1044 noisy
1047
+ 1045 noisy
1048
+ 1046 noisy
1049
+ 1047 clean
1050
+ 1048 clean
1051
+ 1049 noisy
1052
+ 1050 noisy
1053
+ 1051 clean
1054
+ 1052 clean
1055
+ 1053 clean
1056
+ 1054 clean
1057
+ 1055 clean
1058
+ 1056 clean
1059
+ 1057 noisy
1060
+ 1058 clean
1061
+ 1059 clean
1062
+ 1060 clean
1063
+ 1061 clean
1064
+ 1062 clean
1065
+ 1063 noisy
1066
+ 1064 noisy
1067
+ 1065 noisy
1068
+ 1066 noisy
1069
+ 1067 noisy
1070
+ 1068 noisy
1071
+ 1069 clean
1072
+ 1070 clean
1073
+ 1071 noisy
1074
+ 1072 noisy
1075
+ 1073 noisy
1076
+ 1074 clean
1077
+ 1075 clean
1078
+ 1076 noisy
1079
+ 1077 clean
1080
+ 1078 noisy
1081
+ 1079 clean
1082
+ 1080 clean
1083
+ 1081 clean
1084
+ 1082 noisy
1085
+ 1083 noisy
1086
+ 1084 noisy
1087
+ 1085 clean
1088
+ 1086 noisy
1089
+ 1087 noisy
1090
+ 1088 clean
1091
+ 1089 clean
1092
+ 1090 clean
1093
+ 1091 noisy
1094
+ 1092 clean
1095
+ 1093 noisy
1096
+ 1094 clean
1097
+ 1095 clean
1098
+ 1096 noisy
1099
+ 1097 clean
1100
+ 1098 clean
1101
+ 1099 clean
1102
+ 1100 clean
1103
+ 1101 noisy
1104
+ 1102 noisy
1105
+ 1103 clean
1106
+ 1104 clean
1107
+ 1105 noisy
1108
+ 1106 clean
1109
+ 1107 noisy
1110
+ 1108 clean
1111
+ 1109 noisy
1112
+ 1110 noisy
1113
+ 1111 noisy
1114
+ 1112 clean
1115
+ 1113 clean
1116
+ 1114 noisy
1117
+ 1115 noisy
1118
+ 1116 noisy
1119
+ 1117 noisy
1120
+ 1118 clean
1121
+ 1119 clean
1122
+ 1120 noisy
1123
+ 1121 noisy
1124
+ 1122 noisy
1125
+ 1123 noisy
1126
+ 1124 clean
1127
+ 1125 clean
1128
+ 1126 noisy
1129
+ 1127 clean
1130
+ 1128 noisy
1131
+ 1129 clean
1132
+ 1130 noisy
1133
+ 1131 noisy
1134
+ 1132 noisy
1135
+ 1133 clean
1136
+ 1134 noisy
1137
+ 1135 clean
1138
+ 1136 clean
1139
+ 1137 noisy
1140
+ 1138 noisy
1141
+ 1139 clean
1142
+ 1140 clean
1143
+ 1141 noisy
1144
+ 1142 clean
1145
+ 1143 clean
1146
+ 1144 noisy
1147
+ 1145 noisy
1148
+ 1146 clean
1149
+ 1147 clean
1150
+ 1148 noisy
1151
+ 1149 clean
1152
+ 1150 clean
1153
+ 1151 clean
1154
+ 1152 noisy
1155
+ 1153 clean
1156
+ 1154 clean
1157
+ 1155 clean
1158
+ 1156 clean
1159
+ 1157 noisy
1160
+ 1158 noisy
1161
+ 1159 noisy
1162
+ 1160 noisy
1163
+ 1161 noisy
1164
+ 1162 noisy
1165
+ 1163 clean
1166
+ 1164 clean
1167
+ 1165 noisy
1168
+ 1166 noisy
1169
+ 1167 noisy
1170
+ 1168 clean
1171
+ 1169 clean
1172
+ 1170 noisy
1173
+ 1171 noisy
1174
+ 1172 noisy
1175
+ 1173 clean
1176
+ 1174 noisy
1177
+ 1175 noisy
1178
+ 1176 clean
1179
+ 1177 noisy
1180
+ 1178 clean
1181
+ 1179 clean
1182
+ 1180 noisy
1183
+ 1181 clean
1184
+ 1182 clean
1185
+ 1183 noisy
1186
+ 1184 noisy
1187
+ 1185 noisy
1188
+ 1186 clean
1189
+ 1187 clean
1190
+ 1188 noisy
1191
+ 1189 clean
1192
+ 1190 noisy
1193
+ 1191 noisy
1194
+ 1192 clean
1195
+ 1193 clean
1196
+ 1194 noisy
1197
+ 1195 clean
1198
+ 1196 clean
1199
+ 1197 clean
1200
+ 1198 clean
1201
+ 1199 noisy
1202
+ 1200 clean
1203
+ 1201 noisy
1204
+ 1202 noisy
1205
+ 1203 clean
1206
+ 1204 clean
1207
+ 1205 noisy
1208
+ 1206 noisy
1209
+ 1207 noisy
1210
+ 1208 noisy
1211
+ 1209 noisy
1212
+ 1210 noisy
1213
+ 1211 clean
1214
+ 1212 clean
1215
+ 1213 clean
1216
+ 1214 clean
1217
+ 1215 noisy
1218
+ 1216 noisy
1219
+ 1217 clean
1220
+ 1218 noisy
1221
+ 1219 clean
1222
+ 1220 clean
1223
+ 1221 clean
1224
+ 1222 noisy
1225
+ 1223 clean
1226
+ 1224 noisy
1227
+ 1225 clean
1228
+ 1226 clean
1229
+ 1227 noisy
1230
+ 1228 clean
1231
+ 1229 clean
1232
+ 1230 clean
1233
+ 1231 clean
1234
+ 1232 clean
1235
+ 1233 clean
1236
+ 1234 noisy
1237
+ 1235 noisy
1238
+ 1236 noisy
1239
+ 1237 clean
1240
+ 1238 clean
1241
+ 1239 noisy
1242
+ 1240 clean
1243
+ 1241 noisy
1244
+ 1242 noisy
1245
+ 1243 clean
1246
+ 1244 clean
1247
+ 1245 noisy
1248
+ 1246 noisy
1249
+ 1247 clean
1250
+ 1248 noisy
1251
+ 1249 clean
1252
+ 1250 clean
1253
+ 1251 clean
1254
+ 1252 clean
train_results.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.9984038308060654,
3
+ "num_input_tokens_seen": 61536256,
4
+ "total_flos": 3986132331896832.0,
5
+ "train_loss": 0.0478181641492338,
6
+ "train_runtime": 541.4691,
7
+ "train_samples": 60140,
8
+ "train_samples_per_second": 222.136,
9
+ "train_steps_per_second": 3.468
10
+ }
trainer_state.json ADDED
@@ -0,0 +1,1597 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 1.9984038308060654,
5
+ "eval_steps": 250,
6
+ "global_step": 1878,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.010641127959563713,
13
+ "grad_norm": 0.8616393804550171,
14
+ "learning_rate": 3.1914893617021277e-06,
15
+ "loss": 0.6847,
16
+ "num_input_tokens_seen": 327680,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 0.021282255919127427,
21
+ "grad_norm": 1.3775863647460938,
22
+ "learning_rate": 6.3829787234042555e-06,
23
+ "loss": 0.6579,
24
+ "num_input_tokens_seen": 655360,
25
+ "step": 20
26
+ },
27
+ {
28
+ "epoch": 0.03192338387869114,
29
+ "grad_norm": 2.395984411239624,
30
+ "learning_rate": 9.574468085106385e-06,
31
+ "loss": 0.6053,
32
+ "num_input_tokens_seen": 983040,
33
+ "step": 30
34
+ },
35
+ {
36
+ "epoch": 0.042564511838254854,
37
+ "grad_norm": 1.8644745349884033,
38
+ "learning_rate": 1.2765957446808511e-05,
39
+ "loss": 0.53,
40
+ "num_input_tokens_seen": 1310720,
41
+ "step": 40
42
+ },
43
+ {
44
+ "epoch": 0.05320563979781857,
45
+ "grad_norm": 2.1690289974212646,
46
+ "learning_rate": 1.5957446808510637e-05,
47
+ "loss": 0.4419,
48
+ "num_input_tokens_seen": 1638400,
49
+ "step": 50
50
+ },
51
+ {
52
+ "epoch": 0.06384676775738228,
53
+ "grad_norm": 1.3926266431808472,
54
+ "learning_rate": 1.914893617021277e-05,
55
+ "loss": 0.3329,
56
+ "num_input_tokens_seen": 1966080,
57
+ "step": 60
58
+ },
59
+ {
60
+ "epoch": 0.074487895716946,
61
+ "grad_norm": 1.0763431787490845,
62
+ "learning_rate": 2.2340425531914894e-05,
63
+ "loss": 0.2703,
64
+ "num_input_tokens_seen": 2293760,
65
+ "step": 70
66
+ },
67
+ {
68
+ "epoch": 0.08512902367650971,
69
+ "grad_norm": 12.503619194030762,
70
+ "learning_rate": 2.5531914893617022e-05,
71
+ "loss": 0.1906,
72
+ "num_input_tokens_seen": 2621440,
73
+ "step": 80
74
+ },
75
+ {
76
+ "epoch": 0.09577015163607343,
77
+ "grad_norm": 0.6052917838096619,
78
+ "learning_rate": 2.872340425531915e-05,
79
+ "loss": 0.1476,
80
+ "num_input_tokens_seen": 2949120,
81
+ "step": 90
82
+ },
83
+ {
84
+ "epoch": 0.10641127959563713,
85
+ "grad_norm": 5.584522247314453,
86
+ "learning_rate": 2.9899103139013456e-05,
87
+ "loss": 0.1279,
88
+ "num_input_tokens_seen": 3276800,
89
+ "step": 100
90
+ },
91
+ {
92
+ "epoch": 0.11705240755520085,
93
+ "grad_norm": 1.0587092638015747,
94
+ "learning_rate": 2.9730941704035875e-05,
95
+ "loss": 0.112,
96
+ "num_input_tokens_seen": 3604480,
97
+ "step": 110
98
+ },
99
+ {
100
+ "epoch": 0.12769353551476456,
101
+ "grad_norm": 2.5089759826660156,
102
+ "learning_rate": 2.9562780269058297e-05,
103
+ "loss": 0.1119,
104
+ "num_input_tokens_seen": 3932160,
105
+ "step": 120
106
+ },
107
+ {
108
+ "epoch": 0.13833466347432827,
109
+ "grad_norm": 4.025810241699219,
110
+ "learning_rate": 2.939461883408072e-05,
111
+ "loss": 0.1155,
112
+ "num_input_tokens_seen": 4259840,
113
+ "step": 130
114
+ },
115
+ {
116
+ "epoch": 0.148975791433892,
117
+ "grad_norm": 0.6721552014350891,
118
+ "learning_rate": 2.922645739910314e-05,
119
+ "loss": 0.0937,
120
+ "num_input_tokens_seen": 4587520,
121
+ "step": 140
122
+ },
123
+ {
124
+ "epoch": 0.1596169193934557,
125
+ "grad_norm": 4.8363542556762695,
126
+ "learning_rate": 2.905829596412556e-05,
127
+ "loss": 0.089,
128
+ "num_input_tokens_seen": 4915200,
129
+ "step": 150
130
+ },
131
+ {
132
+ "epoch": 0.17025804735301941,
133
+ "grad_norm": 13.355521202087402,
134
+ "learning_rate": 2.889013452914798e-05,
135
+ "loss": 0.0525,
136
+ "num_input_tokens_seen": 5242880,
137
+ "step": 160
138
+ },
139
+ {
140
+ "epoch": 0.18089917531258312,
141
+ "grad_norm": 17.72276496887207,
142
+ "learning_rate": 2.8721973094170402e-05,
143
+ "loss": 0.0699,
144
+ "num_input_tokens_seen": 5570560,
145
+ "step": 170
146
+ },
147
+ {
148
+ "epoch": 0.19154030327214686,
149
+ "grad_norm": 3.537041187286377,
150
+ "learning_rate": 2.8553811659192828e-05,
151
+ "loss": 0.0811,
152
+ "num_input_tokens_seen": 5898240,
153
+ "step": 180
154
+ },
155
+ {
156
+ "epoch": 0.20218143123171056,
157
+ "grad_norm": 0.13461732864379883,
158
+ "learning_rate": 2.8385650224215247e-05,
159
+ "loss": 0.0763,
160
+ "num_input_tokens_seen": 6225920,
161
+ "step": 190
162
+ },
163
+ {
164
+ "epoch": 0.21282255919127427,
165
+ "grad_norm": 9.155119895935059,
166
+ "learning_rate": 2.821748878923767e-05,
167
+ "loss": 0.1048,
168
+ "num_input_tokens_seen": 6553600,
169
+ "step": 200
170
+ },
171
+ {
172
+ "epoch": 0.22346368715083798,
173
+ "grad_norm": 0.7209023833274841,
174
+ "learning_rate": 2.804932735426009e-05,
175
+ "loss": 0.1231,
176
+ "num_input_tokens_seen": 6881280,
177
+ "step": 210
178
+ },
179
+ {
180
+ "epoch": 0.2341048151104017,
181
+ "grad_norm": 0.5195837020874023,
182
+ "learning_rate": 2.788116591928251e-05,
183
+ "loss": 0.0537,
184
+ "num_input_tokens_seen": 7208960,
185
+ "step": 220
186
+ },
187
+ {
188
+ "epoch": 0.24474594306996542,
189
+ "grad_norm": 3.8807427883148193,
190
+ "learning_rate": 2.7713004484304933e-05,
191
+ "loss": 0.0579,
192
+ "num_input_tokens_seen": 7536640,
193
+ "step": 230
194
+ },
195
+ {
196
+ "epoch": 0.2553870710295291,
197
+ "grad_norm": 3.4100818634033203,
198
+ "learning_rate": 2.7544843049327355e-05,
199
+ "loss": 0.062,
200
+ "num_input_tokens_seen": 7864320,
201
+ "step": 240
202
+ },
203
+ {
204
+ "epoch": 0.26602819898909286,
205
+ "grad_norm": 0.3366034924983978,
206
+ "learning_rate": 2.7376681614349774e-05,
207
+ "loss": 0.0298,
208
+ "num_input_tokens_seen": 8192000,
209
+ "step": 250
210
+ },
211
+ {
212
+ "epoch": 0.26602819898909286,
213
+ "eval_accuracy": 0.99,
214
+ "eval_loss": 0.044801026582717896,
215
+ "eval_runtime": 1.1309,
216
+ "eval_samples_per_second": 442.123,
217
+ "eval_steps_per_second": 55.707,
218
+ "num_input_tokens_seen": 8192000,
219
+ "step": 250
220
+ },
221
+ {
222
+ "epoch": 0.27666932694865654,
223
+ "grad_norm": 1.2192944288253784,
224
+ "learning_rate": 2.72085201793722e-05,
225
+ "loss": 0.0562,
226
+ "num_input_tokens_seen": 8519680,
227
+ "step": 260
228
+ },
229
+ {
230
+ "epoch": 0.28731045490822027,
231
+ "grad_norm": 0.7389326691627502,
232
+ "learning_rate": 2.7040358744394622e-05,
233
+ "loss": 0.0435,
234
+ "num_input_tokens_seen": 8847360,
235
+ "step": 270
236
+ },
237
+ {
238
+ "epoch": 0.297951582867784,
239
+ "grad_norm": 1.691129446029663,
240
+ "learning_rate": 2.687219730941704e-05,
241
+ "loss": 0.0256,
242
+ "num_input_tokens_seen": 9175040,
243
+ "step": 280
244
+ },
245
+ {
246
+ "epoch": 0.3085927108273477,
247
+ "grad_norm": 0.20158784091472626,
248
+ "learning_rate": 2.6704035874439464e-05,
249
+ "loss": 0.08,
250
+ "num_input_tokens_seen": 9502720,
251
+ "step": 290
252
+ },
253
+ {
254
+ "epoch": 0.3192338387869114,
255
+ "grad_norm": 0.4045298099517822,
256
+ "learning_rate": 2.6535874439461886e-05,
257
+ "loss": 0.0174,
258
+ "num_input_tokens_seen": 9830400,
259
+ "step": 300
260
+ },
261
+ {
262
+ "epoch": 0.32987496674647515,
263
+ "grad_norm": 5.865575313568115,
264
+ "learning_rate": 2.6367713004484305e-05,
265
+ "loss": 0.0701,
266
+ "num_input_tokens_seen": 10158080,
267
+ "step": 310
268
+ },
269
+ {
270
+ "epoch": 0.34051609470603883,
271
+ "grad_norm": 12.122817993164062,
272
+ "learning_rate": 2.6199551569506727e-05,
273
+ "loss": 0.1398,
274
+ "num_input_tokens_seen": 10485760,
275
+ "step": 320
276
+ },
277
+ {
278
+ "epoch": 0.35115722266560256,
279
+ "grad_norm": 0.43689683079719543,
280
+ "learning_rate": 2.6031390134529146e-05,
281
+ "loss": 0.0645,
282
+ "num_input_tokens_seen": 10813440,
283
+ "step": 330
284
+ },
285
+ {
286
+ "epoch": 0.36179835062516624,
287
+ "grad_norm": 0.1345166265964508,
288
+ "learning_rate": 2.586322869955157e-05,
289
+ "loss": 0.0394,
290
+ "num_input_tokens_seen": 11141120,
291
+ "step": 340
292
+ },
293
+ {
294
+ "epoch": 0.37243947858473,
295
+ "grad_norm": 0.5597580075263977,
296
+ "learning_rate": 2.5695067264573994e-05,
297
+ "loss": 0.0534,
298
+ "num_input_tokens_seen": 11468800,
299
+ "step": 350
300
+ },
301
+ {
302
+ "epoch": 0.3830806065442937,
303
+ "grad_norm": 1.6686193943023682,
304
+ "learning_rate": 2.5526905829596413e-05,
305
+ "loss": 0.0499,
306
+ "num_input_tokens_seen": 11796480,
307
+ "step": 360
308
+ },
309
+ {
310
+ "epoch": 0.3937217345038574,
311
+ "grad_norm": 0.08618992567062378,
312
+ "learning_rate": 2.5358744394618835e-05,
313
+ "loss": 0.0312,
314
+ "num_input_tokens_seen": 12124160,
315
+ "step": 370
316
+ },
317
+ {
318
+ "epoch": 0.4043628624634211,
319
+ "grad_norm": 0.07978615164756775,
320
+ "learning_rate": 2.5190582959641258e-05,
321
+ "loss": 0.0488,
322
+ "num_input_tokens_seen": 12451840,
323
+ "step": 380
324
+ },
325
+ {
326
+ "epoch": 0.41500399042298486,
327
+ "grad_norm": 2.9216437339782715,
328
+ "learning_rate": 2.5022421524663677e-05,
329
+ "loss": 0.0281,
330
+ "num_input_tokens_seen": 12779520,
331
+ "step": 390
332
+ },
333
+ {
334
+ "epoch": 0.42564511838254854,
335
+ "grad_norm": 2.1254470348358154,
336
+ "learning_rate": 2.48542600896861e-05,
337
+ "loss": 0.044,
338
+ "num_input_tokens_seen": 13107200,
339
+ "step": 400
340
+ },
341
+ {
342
+ "epoch": 0.43628624634211227,
343
+ "grad_norm": 0.1027815118432045,
344
+ "learning_rate": 2.468609865470852e-05,
345
+ "loss": 0.0278,
346
+ "num_input_tokens_seen": 13434880,
347
+ "step": 410
348
+ },
349
+ {
350
+ "epoch": 0.44692737430167595,
351
+ "grad_norm": 0.15135648846626282,
352
+ "learning_rate": 2.451793721973094e-05,
353
+ "loss": 0.0448,
354
+ "num_input_tokens_seen": 13762560,
355
+ "step": 420
356
+ },
357
+ {
358
+ "epoch": 0.4575685022612397,
359
+ "grad_norm": 0.09930180758237839,
360
+ "learning_rate": 2.4349775784753363e-05,
361
+ "loss": 0.0294,
362
+ "num_input_tokens_seen": 14090240,
363
+ "step": 430
364
+ },
365
+ {
366
+ "epoch": 0.4682096302208034,
367
+ "grad_norm": 0.37529394030570984,
368
+ "learning_rate": 2.4181614349775788e-05,
369
+ "loss": 0.0437,
370
+ "num_input_tokens_seen": 14417920,
371
+ "step": 440
372
+ },
373
+ {
374
+ "epoch": 0.4788507581803671,
375
+ "grad_norm": 0.0906977429986,
376
+ "learning_rate": 2.4013452914798207e-05,
377
+ "loss": 0.0276,
378
+ "num_input_tokens_seen": 14745600,
379
+ "step": 450
380
+ },
381
+ {
382
+ "epoch": 0.48949188613993083,
383
+ "grad_norm": 2.0479931831359863,
384
+ "learning_rate": 2.384529147982063e-05,
385
+ "loss": 0.0638,
386
+ "num_input_tokens_seen": 15073280,
387
+ "step": 460
388
+ },
389
+ {
390
+ "epoch": 0.5001330140994945,
391
+ "grad_norm": 0.427298903465271,
392
+ "learning_rate": 2.367713004484305e-05,
393
+ "loss": 0.0333,
394
+ "num_input_tokens_seen": 15400960,
395
+ "step": 470
396
+ },
397
+ {
398
+ "epoch": 0.5107741420590582,
399
+ "grad_norm": 0.6889400482177734,
400
+ "learning_rate": 2.350896860986547e-05,
401
+ "loss": 0.0225,
402
+ "num_input_tokens_seen": 15728640,
403
+ "step": 480
404
+ },
405
+ {
406
+ "epoch": 0.521415270018622,
407
+ "grad_norm": 0.06804540008306503,
408
+ "learning_rate": 2.3340807174887893e-05,
409
+ "loss": 0.0285,
410
+ "num_input_tokens_seen": 16056320,
411
+ "step": 490
412
+ },
413
+ {
414
+ "epoch": 0.5320563979781857,
415
+ "grad_norm": 0.20838595926761627,
416
+ "learning_rate": 2.3172645739910312e-05,
417
+ "loss": 0.0141,
418
+ "num_input_tokens_seen": 16384000,
419
+ "step": 500
420
+ },
421
+ {
422
+ "epoch": 0.5320563979781857,
423
+ "eval_accuracy": 0.99,
424
+ "eval_loss": 0.033007875084877014,
425
+ "eval_runtime": 1.1242,
426
+ "eval_samples_per_second": 444.771,
427
+ "eval_steps_per_second": 56.041,
428
+ "num_input_tokens_seen": 16384000,
429
+ "step": 500
430
+ },
431
+ {
432
+ "epoch": 0.5426975259377494,
433
+ "grad_norm": 0.09140049666166306,
434
+ "learning_rate": 2.3004484304932734e-05,
435
+ "loss": 0.019,
436
+ "num_input_tokens_seen": 16711680,
437
+ "step": 510
438
+ },
439
+ {
440
+ "epoch": 0.5533386538973131,
441
+ "grad_norm": 0.06261716037988663,
442
+ "learning_rate": 2.283632286995516e-05,
443
+ "loss": 0.0355,
444
+ "num_input_tokens_seen": 17039360,
445
+ "step": 520
446
+ },
447
+ {
448
+ "epoch": 0.5639797818568768,
449
+ "grad_norm": 2.4450674057006836,
450
+ "learning_rate": 2.266816143497758e-05,
451
+ "loss": 0.031,
452
+ "num_input_tokens_seen": 17367040,
453
+ "step": 530
454
+ },
455
+ {
456
+ "epoch": 0.5746209098164405,
457
+ "grad_norm": 1.1212217807769775,
458
+ "learning_rate": 2.25e-05,
459
+ "loss": 0.0265,
460
+ "num_input_tokens_seen": 17694720,
461
+ "step": 540
462
+ },
463
+ {
464
+ "epoch": 0.5852620377760043,
465
+ "grad_norm": 0.638861358165741,
466
+ "learning_rate": 2.2331838565022424e-05,
467
+ "loss": 0.041,
468
+ "num_input_tokens_seen": 18022400,
469
+ "step": 550
470
+ },
471
+ {
472
+ "epoch": 0.595903165735568,
473
+ "grad_norm": 0.8384909629821777,
474
+ "learning_rate": 2.2163677130044843e-05,
475
+ "loss": 0.0377,
476
+ "num_input_tokens_seen": 18350080,
477
+ "step": 560
478
+ },
479
+ {
480
+ "epoch": 0.6065442936951316,
481
+ "grad_norm": 2.6054413318634033,
482
+ "learning_rate": 2.1995515695067265e-05,
483
+ "loss": 0.0621,
484
+ "num_input_tokens_seen": 18677760,
485
+ "step": 570
486
+ },
487
+ {
488
+ "epoch": 0.6171854216546954,
489
+ "grad_norm": 0.05188291519880295,
490
+ "learning_rate": 2.1827354260089687e-05,
491
+ "loss": 0.0089,
492
+ "num_input_tokens_seen": 19005440,
493
+ "step": 580
494
+ },
495
+ {
496
+ "epoch": 0.6278265496142591,
497
+ "grad_norm": 6.18527889251709,
498
+ "learning_rate": 2.1659192825112106e-05,
499
+ "loss": 0.0623,
500
+ "num_input_tokens_seen": 19333120,
501
+ "step": 590
502
+ },
503
+ {
504
+ "epoch": 0.6384676775738228,
505
+ "grad_norm": 4.499662399291992,
506
+ "learning_rate": 2.149103139013453e-05,
507
+ "loss": 0.0413,
508
+ "num_input_tokens_seen": 19660800,
509
+ "step": 600
510
+ },
511
+ {
512
+ "epoch": 0.6491088055333866,
513
+ "grad_norm": 0.06525593250989914,
514
+ "learning_rate": 2.1322869955156954e-05,
515
+ "loss": 0.0268,
516
+ "num_input_tokens_seen": 19988480,
517
+ "step": 610
518
+ },
519
+ {
520
+ "epoch": 0.6597499334929503,
521
+ "grad_norm": 0.7937769889831543,
522
+ "learning_rate": 2.1154708520179373e-05,
523
+ "loss": 0.0294,
524
+ "num_input_tokens_seen": 20316160,
525
+ "step": 620
526
+ },
527
+ {
528
+ "epoch": 0.6703910614525139,
529
+ "grad_norm": 0.42232292890548706,
530
+ "learning_rate": 2.0986547085201796e-05,
531
+ "loss": 0.0086,
532
+ "num_input_tokens_seen": 20643840,
533
+ "step": 630
534
+ },
535
+ {
536
+ "epoch": 0.6810321894120777,
537
+ "grad_norm": 0.23680944740772247,
538
+ "learning_rate": 2.0818385650224215e-05,
539
+ "loss": 0.0182,
540
+ "num_input_tokens_seen": 20971520,
541
+ "step": 640
542
+ },
543
+ {
544
+ "epoch": 0.6916733173716414,
545
+ "grad_norm": 0.8892483115196228,
546
+ "learning_rate": 2.0650224215246637e-05,
547
+ "loss": 0.0158,
548
+ "num_input_tokens_seen": 21299200,
549
+ "step": 650
550
+ },
551
+ {
552
+ "epoch": 0.7023144453312051,
553
+ "grad_norm": 9.271723747253418,
554
+ "learning_rate": 2.048206278026906e-05,
555
+ "loss": 0.0332,
556
+ "num_input_tokens_seen": 21626880,
557
+ "step": 660
558
+ },
559
+ {
560
+ "epoch": 0.7129555732907689,
561
+ "grad_norm": 0.681903600692749,
562
+ "learning_rate": 2.0313901345291478e-05,
563
+ "loss": 0.0402,
564
+ "num_input_tokens_seen": 21954560,
565
+ "step": 670
566
+ },
567
+ {
568
+ "epoch": 0.7235967012503325,
569
+ "grad_norm": 2.4827804565429688,
570
+ "learning_rate": 2.01457399103139e-05,
571
+ "loss": 0.0297,
572
+ "num_input_tokens_seen": 22282240,
573
+ "step": 680
574
+ },
575
+ {
576
+ "epoch": 0.7342378292098962,
577
+ "grad_norm": 2.727994203567505,
578
+ "learning_rate": 1.9977578475336323e-05,
579
+ "loss": 0.027,
580
+ "num_input_tokens_seen": 22609920,
581
+ "step": 690
582
+ },
583
+ {
584
+ "epoch": 0.74487895716946,
585
+ "grad_norm": 1.978765845298767,
586
+ "learning_rate": 1.9809417040358745e-05,
587
+ "loss": 0.0279,
588
+ "num_input_tokens_seen": 22937600,
589
+ "step": 700
590
+ },
591
+ {
592
+ "epoch": 0.7555200851290237,
593
+ "grad_norm": 2.512544870376587,
594
+ "learning_rate": 1.9641255605381167e-05,
595
+ "loss": 0.0323,
596
+ "num_input_tokens_seen": 23265280,
597
+ "step": 710
598
+ },
599
+ {
600
+ "epoch": 0.7661612130885874,
601
+ "grad_norm": 5.157982349395752,
602
+ "learning_rate": 1.947309417040359e-05,
603
+ "loss": 0.0514,
604
+ "num_input_tokens_seen": 23592960,
605
+ "step": 720
606
+ },
607
+ {
608
+ "epoch": 0.7768023410481512,
609
+ "grad_norm": 0.037381790578365326,
610
+ "learning_rate": 1.930493273542601e-05,
611
+ "loss": 0.0077,
612
+ "num_input_tokens_seen": 23920640,
613
+ "step": 730
614
+ },
615
+ {
616
+ "epoch": 0.7874434690077148,
617
+ "grad_norm": 1.0004149675369263,
618
+ "learning_rate": 1.913677130044843e-05,
619
+ "loss": 0.0315,
620
+ "num_input_tokens_seen": 24248320,
621
+ "step": 740
622
+ },
623
+ {
624
+ "epoch": 0.7980845969672785,
625
+ "grad_norm": 0.046527761965990067,
626
+ "learning_rate": 1.8968609865470853e-05,
627
+ "loss": 0.02,
628
+ "num_input_tokens_seen": 24576000,
629
+ "step": 750
630
+ },
631
+ {
632
+ "epoch": 0.7980845969672785,
633
+ "eval_accuracy": 0.99,
634
+ "eval_loss": 0.02980552613735199,
635
+ "eval_runtime": 1.1295,
636
+ "eval_samples_per_second": 442.672,
637
+ "eval_steps_per_second": 55.777,
638
+ "num_input_tokens_seen": 24576000,
639
+ "step": 750
640
+ },
641
+ {
642
+ "epoch": 0.8087257249268422,
643
+ "grad_norm": 0.3098304867744446,
644
+ "learning_rate": 1.8800448430493272e-05,
645
+ "loss": 0.02,
646
+ "num_input_tokens_seen": 24903680,
647
+ "step": 760
648
+ },
649
+ {
650
+ "epoch": 0.819366852886406,
651
+ "grad_norm": 1.8411376476287842,
652
+ "learning_rate": 1.8632286995515695e-05,
653
+ "loss": 0.0219,
654
+ "num_input_tokens_seen": 25231360,
655
+ "step": 770
656
+ },
657
+ {
658
+ "epoch": 0.8300079808459697,
659
+ "grad_norm": 0.6672658920288086,
660
+ "learning_rate": 1.8464125560538117e-05,
661
+ "loss": 0.0236,
662
+ "num_input_tokens_seen": 25559040,
663
+ "step": 780
664
+ },
665
+ {
666
+ "epoch": 0.8406491088055333,
667
+ "grad_norm": 0.15667960047721863,
668
+ "learning_rate": 1.829596412556054e-05,
669
+ "loss": 0.0373,
670
+ "num_input_tokens_seen": 25886720,
671
+ "step": 790
672
+ },
673
+ {
674
+ "epoch": 0.8512902367650971,
675
+ "grad_norm": 0.039243053644895554,
676
+ "learning_rate": 1.812780269058296e-05,
677
+ "loss": 0.0118,
678
+ "num_input_tokens_seen": 26214400,
679
+ "step": 800
680
+ },
681
+ {
682
+ "epoch": 0.8619313647246608,
683
+ "grad_norm": 0.9345981478691101,
684
+ "learning_rate": 1.795964125560538e-05,
685
+ "loss": 0.0322,
686
+ "num_input_tokens_seen": 26542080,
687
+ "step": 810
688
+ },
689
+ {
690
+ "epoch": 0.8725724926842245,
691
+ "grad_norm": 0.06790352612733841,
692
+ "learning_rate": 1.7791479820627803e-05,
693
+ "loss": 0.0097,
694
+ "num_input_tokens_seen": 26869760,
695
+ "step": 820
696
+ },
697
+ {
698
+ "epoch": 0.8832136206437883,
699
+ "grad_norm": 0.065700002014637,
700
+ "learning_rate": 1.7623318385650225e-05,
701
+ "loss": 0.0188,
702
+ "num_input_tokens_seen": 27197440,
703
+ "step": 830
704
+ },
705
+ {
706
+ "epoch": 0.8938547486033519,
707
+ "grad_norm": 3.7558648586273193,
708
+ "learning_rate": 1.7455156950672644e-05,
709
+ "loss": 0.0253,
710
+ "num_input_tokens_seen": 27525120,
711
+ "step": 840
712
+ },
713
+ {
714
+ "epoch": 0.9044958765629156,
715
+ "grad_norm": 4.746110916137695,
716
+ "learning_rate": 1.7286995515695067e-05,
717
+ "loss": 0.0171,
718
+ "num_input_tokens_seen": 27852800,
719
+ "step": 850
720
+ },
721
+ {
722
+ "epoch": 0.9151370045224794,
723
+ "grad_norm": 0.26326820254325867,
724
+ "learning_rate": 1.711883408071749e-05,
725
+ "loss": 0.0236,
726
+ "num_input_tokens_seen": 28180480,
727
+ "step": 860
728
+ },
729
+ {
730
+ "epoch": 0.9257781324820431,
731
+ "grad_norm": 0.10672000050544739,
732
+ "learning_rate": 1.695067264573991e-05,
733
+ "loss": 0.0085,
734
+ "num_input_tokens_seen": 28508160,
735
+ "step": 870
736
+ },
737
+ {
738
+ "epoch": 0.9364192604416068,
739
+ "grad_norm": 0.16295024752616882,
740
+ "learning_rate": 1.6782511210762334e-05,
741
+ "loss": 0.0137,
742
+ "num_input_tokens_seen": 28835840,
743
+ "step": 880
744
+ },
745
+ {
746
+ "epoch": 0.9470603884011706,
747
+ "grad_norm": 4.8795857429504395,
748
+ "learning_rate": 1.6614349775784756e-05,
749
+ "loss": 0.0305,
750
+ "num_input_tokens_seen": 29163520,
751
+ "step": 890
752
+ },
753
+ {
754
+ "epoch": 0.9577015163607342,
755
+ "grad_norm": 0.06518769264221191,
756
+ "learning_rate": 1.6446188340807175e-05,
757
+ "loss": 0.0117,
758
+ "num_input_tokens_seen": 29491200,
759
+ "step": 900
760
+ },
761
+ {
762
+ "epoch": 0.9683426443202979,
763
+ "grad_norm": 1.4961518049240112,
764
+ "learning_rate": 1.6278026905829597e-05,
765
+ "loss": 0.0359,
766
+ "num_input_tokens_seen": 29818880,
767
+ "step": 910
768
+ },
769
+ {
770
+ "epoch": 0.9789837722798617,
771
+ "grad_norm": 1.2783812284469604,
772
+ "learning_rate": 1.610986547085202e-05,
773
+ "loss": 0.0405,
774
+ "num_input_tokens_seen": 30146560,
775
+ "step": 920
776
+ },
777
+ {
778
+ "epoch": 0.9896249002394254,
779
+ "grad_norm": 0.15925170481204987,
780
+ "learning_rate": 1.594170403587444e-05,
781
+ "loss": 0.0356,
782
+ "num_input_tokens_seen": 30474240,
783
+ "step": 930
784
+ },
785
+ {
786
+ "epoch": 1.000266028198989,
787
+ "grad_norm": 1.536391019821167,
788
+ "learning_rate": 1.577354260089686e-05,
789
+ "loss": 0.0159,
790
+ "num_input_tokens_seen": 30799872,
791
+ "step": 940
792
+ },
793
+ {
794
+ "epoch": 1.0109071561585528,
795
+ "grad_norm": 0.04294372722506523,
796
+ "learning_rate": 1.560538116591928e-05,
797
+ "loss": 0.0437,
798
+ "num_input_tokens_seen": 31127552,
799
+ "step": 950
800
+ },
801
+ {
802
+ "epoch": 1.0215482841181165,
803
+ "grad_norm": 0.13462825119495392,
804
+ "learning_rate": 1.5437219730941705e-05,
805
+ "loss": 0.0129,
806
+ "num_input_tokens_seen": 31455232,
807
+ "step": 960
808
+ },
809
+ {
810
+ "epoch": 1.0321894120776802,
811
+ "grad_norm": 0.03951927274465561,
812
+ "learning_rate": 1.5269058295964128e-05,
813
+ "loss": 0.017,
814
+ "num_input_tokens_seen": 31782912,
815
+ "step": 970
816
+ },
817
+ {
818
+ "epoch": 1.042830540037244,
819
+ "grad_norm": 0.12142454832792282,
820
+ "learning_rate": 1.5100896860986547e-05,
821
+ "loss": 0.0207,
822
+ "num_input_tokens_seen": 32110592,
823
+ "step": 980
824
+ },
825
+ {
826
+ "epoch": 1.0534716679968077,
827
+ "grad_norm": 0.11652370542287827,
828
+ "learning_rate": 1.4932735426008969e-05,
829
+ "loss": 0.0176,
830
+ "num_input_tokens_seen": 32438272,
831
+ "step": 990
832
+ },
833
+ {
834
+ "epoch": 1.0641127959563714,
835
+ "grad_norm": 4.033369064331055,
836
+ "learning_rate": 1.476457399103139e-05,
837
+ "loss": 0.0085,
838
+ "num_input_tokens_seen": 32765952,
839
+ "step": 1000
840
+ },
841
+ {
842
+ "epoch": 1.0641127959563714,
843
+ "eval_accuracy": 0.994,
844
+ "eval_loss": 0.022239448502659798,
845
+ "eval_runtime": 1.1241,
846
+ "eval_samples_per_second": 444.814,
847
+ "eval_steps_per_second": 56.047,
848
+ "num_input_tokens_seen": 32765952,
849
+ "step": 1000
850
+ },
851
+ {
852
+ "epoch": 1.0747539239159352,
853
+ "grad_norm": 0.10022466629743576,
854
+ "learning_rate": 1.4596412556053812e-05,
855
+ "loss": 0.0196,
856
+ "num_input_tokens_seen": 33093632,
857
+ "step": 1010
858
+ },
859
+ {
860
+ "epoch": 1.085395051875499,
861
+ "grad_norm": 0.0608280785381794,
862
+ "learning_rate": 1.4428251121076234e-05,
863
+ "loss": 0.0244,
864
+ "num_input_tokens_seen": 33421312,
865
+ "step": 1020
866
+ },
867
+ {
868
+ "epoch": 1.0960361798350626,
869
+ "grad_norm": 0.6638007164001465,
870
+ "learning_rate": 1.4260089686098655e-05,
871
+ "loss": 0.0049,
872
+ "num_input_tokens_seen": 33748992,
873
+ "step": 1030
874
+ },
875
+ {
876
+ "epoch": 1.1066773077946261,
877
+ "grad_norm": 0.17382824420928955,
878
+ "learning_rate": 1.4091928251121077e-05,
879
+ "loss": 0.0106,
880
+ "num_input_tokens_seen": 34076672,
881
+ "step": 1040
882
+ },
883
+ {
884
+ "epoch": 1.1173184357541899,
885
+ "grad_norm": 0.10657654702663422,
886
+ "learning_rate": 1.3923766816143498e-05,
887
+ "loss": 0.0381,
888
+ "num_input_tokens_seen": 34404352,
889
+ "step": 1050
890
+ },
891
+ {
892
+ "epoch": 1.1279595637137536,
893
+ "grad_norm": 0.7529979348182678,
894
+ "learning_rate": 1.375560538116592e-05,
895
+ "loss": 0.0235,
896
+ "num_input_tokens_seen": 34732032,
897
+ "step": 1060
898
+ },
899
+ {
900
+ "epoch": 1.1386006916733173,
901
+ "grad_norm": 0.07195574790239334,
902
+ "learning_rate": 1.358744394618834e-05,
903
+ "loss": 0.0173,
904
+ "num_input_tokens_seen": 35059712,
905
+ "step": 1070
906
+ },
907
+ {
908
+ "epoch": 1.149241819632881,
909
+ "grad_norm": 0.8922456502914429,
910
+ "learning_rate": 1.3419282511210763e-05,
911
+ "loss": 0.0201,
912
+ "num_input_tokens_seen": 35387392,
913
+ "step": 1080
914
+ },
915
+ {
916
+ "epoch": 1.1598829475924448,
917
+ "grad_norm": 0.2780587375164032,
918
+ "learning_rate": 1.3251121076233184e-05,
919
+ "loss": 0.0071,
920
+ "num_input_tokens_seen": 35715072,
921
+ "step": 1090
922
+ },
923
+ {
924
+ "epoch": 1.1705240755520085,
925
+ "grad_norm": 0.014401647262275219,
926
+ "learning_rate": 1.3082959641255604e-05,
927
+ "loss": 0.0025,
928
+ "num_input_tokens_seen": 36042752,
929
+ "step": 1100
930
+ },
931
+ {
932
+ "epoch": 1.1811652035115723,
933
+ "grad_norm": 0.07402833551168442,
934
+ "learning_rate": 1.2914798206278028e-05,
935
+ "loss": 0.0038,
936
+ "num_input_tokens_seen": 36370432,
937
+ "step": 1110
938
+ },
939
+ {
940
+ "epoch": 1.191806331471136,
941
+ "grad_norm": 0.035160522907972336,
942
+ "learning_rate": 1.2746636771300449e-05,
943
+ "loss": 0.0221,
944
+ "num_input_tokens_seen": 36698112,
945
+ "step": 1120
946
+ },
947
+ {
948
+ "epoch": 1.2024474594306997,
949
+ "grad_norm": 0.23754417896270752,
950
+ "learning_rate": 1.257847533632287e-05,
951
+ "loss": 0.0044,
952
+ "num_input_tokens_seen": 37025792,
953
+ "step": 1130
954
+ },
955
+ {
956
+ "epoch": 1.2130885873902635,
957
+ "grad_norm": 0.07629762589931488,
958
+ "learning_rate": 1.241031390134529e-05,
959
+ "loss": 0.0119,
960
+ "num_input_tokens_seen": 37353472,
961
+ "step": 1140
962
+ },
963
+ {
964
+ "epoch": 1.223729715349827,
965
+ "grad_norm": 0.23725423216819763,
966
+ "learning_rate": 1.2242152466367714e-05,
967
+ "loss": 0.0279,
968
+ "num_input_tokens_seen": 37681152,
969
+ "step": 1150
970
+ },
971
+ {
972
+ "epoch": 1.2343708433093907,
973
+ "grad_norm": 1.0171340703964233,
974
+ "learning_rate": 1.2073991031390135e-05,
975
+ "loss": 0.0531,
976
+ "num_input_tokens_seen": 38008832,
977
+ "step": 1160
978
+ },
979
+ {
980
+ "epoch": 1.2450119712689545,
981
+ "grad_norm": 0.016075875610113144,
982
+ "learning_rate": 1.1905829596412556e-05,
983
+ "loss": 0.0261,
984
+ "num_input_tokens_seen": 38336512,
985
+ "step": 1170
986
+ },
987
+ {
988
+ "epoch": 1.2556530992285182,
989
+ "grad_norm": 0.8257108330726624,
990
+ "learning_rate": 1.1737668161434978e-05,
991
+ "loss": 0.0166,
992
+ "num_input_tokens_seen": 38664192,
993
+ "step": 1180
994
+ },
995
+ {
996
+ "epoch": 1.266294227188082,
997
+ "grad_norm": 0.0884622186422348,
998
+ "learning_rate": 1.15695067264574e-05,
999
+ "loss": 0.0077,
1000
+ "num_input_tokens_seen": 38991872,
1001
+ "step": 1190
1002
+ },
1003
+ {
1004
+ "epoch": 1.2769353551476457,
1005
+ "grad_norm": 0.101267971098423,
1006
+ "learning_rate": 1.1401345291479821e-05,
1007
+ "loss": 0.019,
1008
+ "num_input_tokens_seen": 39319552,
1009
+ "step": 1200
1010
+ },
1011
+ {
1012
+ "epoch": 1.2875764831072094,
1013
+ "grad_norm": 2.194119691848755,
1014
+ "learning_rate": 1.1233183856502243e-05,
1015
+ "loss": 0.0131,
1016
+ "num_input_tokens_seen": 39647232,
1017
+ "step": 1210
1018
+ },
1019
+ {
1020
+ "epoch": 1.2982176110667731,
1021
+ "grad_norm": 2.7684483528137207,
1022
+ "learning_rate": 1.1065022421524664e-05,
1023
+ "loss": 0.0076,
1024
+ "num_input_tokens_seen": 39974912,
1025
+ "step": 1220
1026
+ },
1027
+ {
1028
+ "epoch": 1.3088587390263369,
1029
+ "grad_norm": 2.1547205448150635,
1030
+ "learning_rate": 1.0896860986547085e-05,
1031
+ "loss": 0.0242,
1032
+ "num_input_tokens_seen": 40302592,
1033
+ "step": 1230
1034
+ },
1035
+ {
1036
+ "epoch": 1.3194998669859004,
1037
+ "grad_norm": 0.39225855469703674,
1038
+ "learning_rate": 1.0728699551569507e-05,
1039
+ "loss": 0.013,
1040
+ "num_input_tokens_seen": 40630272,
1041
+ "step": 1240
1042
+ },
1043
+ {
1044
+ "epoch": 1.3301409949454643,
1045
+ "grad_norm": 0.12444789707660675,
1046
+ "learning_rate": 1.056053811659193e-05,
1047
+ "loss": 0.0174,
1048
+ "num_input_tokens_seen": 40957952,
1049
+ "step": 1250
1050
+ },
1051
+ {
1052
+ "epoch": 1.3301409949454643,
1053
+ "eval_accuracy": 0.994,
1054
+ "eval_loss": 0.020717209205031395,
1055
+ "eval_runtime": 1.1258,
1056
+ "eval_samples_per_second": 444.121,
1057
+ "eval_steps_per_second": 55.959,
1058
+ "num_input_tokens_seen": 40957952,
1059
+ "step": 1250
1060
+ },
1061
+ {
1062
+ "epoch": 1.3407821229050279,
1063
+ "grad_norm": 0.224708691239357,
1064
+ "learning_rate": 1.039237668161435e-05,
1065
+ "loss": 0.0087,
1066
+ "num_input_tokens_seen": 41285632,
1067
+ "step": 1260
1068
+ },
1069
+ {
1070
+ "epoch": 1.3514232508645916,
1071
+ "grad_norm": 0.08499462902545929,
1072
+ "learning_rate": 1.022421524663677e-05,
1073
+ "loss": 0.0182,
1074
+ "num_input_tokens_seen": 41613312,
1075
+ "step": 1270
1076
+ },
1077
+ {
1078
+ "epoch": 1.3620643788241553,
1079
+ "grad_norm": 0.05140333250164986,
1080
+ "learning_rate": 1.0056053811659195e-05,
1081
+ "loss": 0.0034,
1082
+ "num_input_tokens_seen": 41940992,
1083
+ "step": 1280
1084
+ },
1085
+ {
1086
+ "epoch": 1.372705506783719,
1087
+ "grad_norm": 0.05546234920620918,
1088
+ "learning_rate": 9.887892376681615e-06,
1089
+ "loss": 0.0117,
1090
+ "num_input_tokens_seen": 42268672,
1091
+ "step": 1290
1092
+ },
1093
+ {
1094
+ "epoch": 1.3833466347432828,
1095
+ "grad_norm": 0.029206566512584686,
1096
+ "learning_rate": 9.719730941704036e-06,
1097
+ "loss": 0.0179,
1098
+ "num_input_tokens_seen": 42596352,
1099
+ "step": 1300
1100
+ },
1101
+ {
1102
+ "epoch": 1.3939877627028465,
1103
+ "grad_norm": 0.3235812485218048,
1104
+ "learning_rate": 9.551569506726456e-06,
1105
+ "loss": 0.0333,
1106
+ "num_input_tokens_seen": 42924032,
1107
+ "step": 1310
1108
+ },
1109
+ {
1110
+ "epoch": 1.4046288906624103,
1111
+ "grad_norm": 4.916908264160156,
1112
+ "learning_rate": 9.38340807174888e-06,
1113
+ "loss": 0.0167,
1114
+ "num_input_tokens_seen": 43251712,
1115
+ "step": 1320
1116
+ },
1117
+ {
1118
+ "epoch": 1.415270018621974,
1119
+ "grad_norm": 0.10124430060386658,
1120
+ "learning_rate": 9.215246636771301e-06,
1121
+ "loss": 0.0299,
1122
+ "num_input_tokens_seen": 43579392,
1123
+ "step": 1330
1124
+ },
1125
+ {
1126
+ "epoch": 1.4259111465815377,
1127
+ "grad_norm": 0.09930448234081268,
1128
+ "learning_rate": 9.047085201793722e-06,
1129
+ "loss": 0.0112,
1130
+ "num_input_tokens_seen": 43907072,
1131
+ "step": 1340
1132
+ },
1133
+ {
1134
+ "epoch": 1.4365522745411012,
1135
+ "grad_norm": 0.1370278298854828,
1136
+ "learning_rate": 8.878923766816144e-06,
1137
+ "loss": 0.0105,
1138
+ "num_input_tokens_seen": 44234752,
1139
+ "step": 1350
1140
+ },
1141
+ {
1142
+ "epoch": 1.4471934025006652,
1143
+ "grad_norm": 1.9884629249572754,
1144
+ "learning_rate": 8.710762331838565e-06,
1145
+ "loss": 0.0093,
1146
+ "num_input_tokens_seen": 44562432,
1147
+ "step": 1360
1148
+ },
1149
+ {
1150
+ "epoch": 1.4578345304602287,
1151
+ "grad_norm": 0.768826961517334,
1152
+ "learning_rate": 8.542600896860987e-06,
1153
+ "loss": 0.0297,
1154
+ "num_input_tokens_seen": 44890112,
1155
+ "step": 1370
1156
+ },
1157
+ {
1158
+ "epoch": 1.4684756584197924,
1159
+ "grad_norm": 0.08758696168661118,
1160
+ "learning_rate": 8.374439461883408e-06,
1161
+ "loss": 0.0234,
1162
+ "num_input_tokens_seen": 45217792,
1163
+ "step": 1380
1164
+ },
1165
+ {
1166
+ "epoch": 1.4791167863793562,
1167
+ "grad_norm": 0.1405934989452362,
1168
+ "learning_rate": 8.20627802690583e-06,
1169
+ "loss": 0.0072,
1170
+ "num_input_tokens_seen": 45545472,
1171
+ "step": 1390
1172
+ },
1173
+ {
1174
+ "epoch": 1.48975791433892,
1175
+ "grad_norm": 0.32703763246536255,
1176
+ "learning_rate": 8.03811659192825e-06,
1177
+ "loss": 0.0023,
1178
+ "num_input_tokens_seen": 45873152,
1179
+ "step": 1400
1180
+ },
1181
+ {
1182
+ "epoch": 1.5003990422984836,
1183
+ "grad_norm": 0.8952039480209351,
1184
+ "learning_rate": 7.869955156950673e-06,
1185
+ "loss": 0.0183,
1186
+ "num_input_tokens_seen": 46200832,
1187
+ "step": 1410
1188
+ },
1189
+ {
1190
+ "epoch": 1.5110401702580474,
1191
+ "grad_norm": 0.2962280213832855,
1192
+ "learning_rate": 7.701793721973095e-06,
1193
+ "loss": 0.0013,
1194
+ "num_input_tokens_seen": 46528512,
1195
+ "step": 1420
1196
+ },
1197
+ {
1198
+ "epoch": 1.5216812982176111,
1199
+ "grad_norm": 2.0377979278564453,
1200
+ "learning_rate": 7.533632286995516e-06,
1201
+ "loss": 0.0195,
1202
+ "num_input_tokens_seen": 46856192,
1203
+ "step": 1430
1204
+ },
1205
+ {
1206
+ "epoch": 1.5323224261771746,
1207
+ "grad_norm": 0.08011902123689651,
1208
+ "learning_rate": 7.365470852017937e-06,
1209
+ "loss": 0.0065,
1210
+ "num_input_tokens_seen": 47183872,
1211
+ "step": 1440
1212
+ },
1213
+ {
1214
+ "epoch": 1.5429635541367386,
1215
+ "grad_norm": 0.07826100289821625,
1216
+ "learning_rate": 7.197309417040359e-06,
1217
+ "loss": 0.0203,
1218
+ "num_input_tokens_seen": 47511552,
1219
+ "step": 1450
1220
+ },
1221
+ {
1222
+ "epoch": 1.553604682096302,
1223
+ "grad_norm": 0.08626201748847961,
1224
+ "learning_rate": 7.02914798206278e-06,
1225
+ "loss": 0.0123,
1226
+ "num_input_tokens_seen": 47839232,
1227
+ "step": 1460
1228
+ },
1229
+ {
1230
+ "epoch": 1.564245810055866,
1231
+ "grad_norm": 1.227737545967102,
1232
+ "learning_rate": 6.860986547085202e-06,
1233
+ "loss": 0.0159,
1234
+ "num_input_tokens_seen": 48166912,
1235
+ "step": 1470
1236
+ },
1237
+ {
1238
+ "epoch": 1.5748869380154296,
1239
+ "grad_norm": 0.45808491110801697,
1240
+ "learning_rate": 6.692825112107623e-06,
1241
+ "loss": 0.0182,
1242
+ "num_input_tokens_seen": 48494592,
1243
+ "step": 1480
1244
+ },
1245
+ {
1246
+ "epoch": 1.5855280659749933,
1247
+ "grad_norm": 0.19725441932678223,
1248
+ "learning_rate": 6.524663677130045e-06,
1249
+ "loss": 0.011,
1250
+ "num_input_tokens_seen": 48822272,
1251
+ "step": 1490
1252
+ },
1253
+ {
1254
+ "epoch": 1.596169193934557,
1255
+ "grad_norm": 0.11997473984956741,
1256
+ "learning_rate": 6.356502242152466e-06,
1257
+ "loss": 0.0104,
1258
+ "num_input_tokens_seen": 49149952,
1259
+ "step": 1500
1260
+ },
1261
+ {
1262
+ "epoch": 1.596169193934557,
1263
+ "eval_accuracy": 0.996,
1264
+ "eval_loss": 0.02015475556254387,
1265
+ "eval_runtime": 1.1247,
1266
+ "eval_samples_per_second": 444.581,
1267
+ "eval_steps_per_second": 56.017,
1268
+ "num_input_tokens_seen": 49149952,
1269
+ "step": 1500
1270
+ },
1271
+ {
1272
+ "epoch": 1.6068103218941208,
1273
+ "grad_norm": 0.08161328732967377,
1274
+ "learning_rate": 6.188340807174889e-06,
1275
+ "loss": 0.011,
1276
+ "num_input_tokens_seen": 49477632,
1277
+ "step": 1510
1278
+ },
1279
+ {
1280
+ "epoch": 1.6174514498536845,
1281
+ "grad_norm": 0.04879956319928169,
1282
+ "learning_rate": 6.020179372197309e-06,
1283
+ "loss": 0.0034,
1284
+ "num_input_tokens_seen": 49805312,
1285
+ "step": 1520
1286
+ },
1287
+ {
1288
+ "epoch": 1.6280925778132482,
1289
+ "grad_norm": 0.2356010526418686,
1290
+ "learning_rate": 5.8520179372197316e-06,
1291
+ "loss": 0.0305,
1292
+ "num_input_tokens_seen": 50132992,
1293
+ "step": 1530
1294
+ },
1295
+ {
1296
+ "epoch": 1.638733705772812,
1297
+ "grad_norm": 0.08499031513929367,
1298
+ "learning_rate": 5.683856502242152e-06,
1299
+ "loss": 0.0106,
1300
+ "num_input_tokens_seen": 50460672,
1301
+ "step": 1540
1302
+ },
1303
+ {
1304
+ "epoch": 1.6493748337323755,
1305
+ "grad_norm": 0.10495586693286896,
1306
+ "learning_rate": 5.5156950672645745e-06,
1307
+ "loss": 0.012,
1308
+ "num_input_tokens_seen": 50788352,
1309
+ "step": 1550
1310
+ },
1311
+ {
1312
+ "epoch": 1.6600159616919394,
1313
+ "grad_norm": 0.09235712140798569,
1314
+ "learning_rate": 5.347533632286995e-06,
1315
+ "loss": 0.0017,
1316
+ "num_input_tokens_seen": 51116032,
1317
+ "step": 1560
1318
+ },
1319
+ {
1320
+ "epoch": 1.670657089651503,
1321
+ "grad_norm": 0.04202970489859581,
1322
+ "learning_rate": 5.1793721973094175e-06,
1323
+ "loss": 0.0172,
1324
+ "num_input_tokens_seen": 51443712,
1325
+ "step": 1570
1326
+ },
1327
+ {
1328
+ "epoch": 1.681298217611067,
1329
+ "grad_norm": 3.6560862064361572,
1330
+ "learning_rate": 5.011210762331839e-06,
1331
+ "loss": 0.0259,
1332
+ "num_input_tokens_seen": 51771392,
1333
+ "step": 1580
1334
+ },
1335
+ {
1336
+ "epoch": 1.6919393455706304,
1337
+ "grad_norm": 0.20075471699237823,
1338
+ "learning_rate": 4.8430493273542605e-06,
1339
+ "loss": 0.0144,
1340
+ "num_input_tokens_seen": 52099072,
1341
+ "step": 1590
1342
+ },
1343
+ {
1344
+ "epoch": 1.7025804735301941,
1345
+ "grad_norm": 0.14858105778694153,
1346
+ "learning_rate": 4.674887892376682e-06,
1347
+ "loss": 0.0099,
1348
+ "num_input_tokens_seen": 52426752,
1349
+ "step": 1600
1350
+ },
1351
+ {
1352
+ "epoch": 1.7132216014897579,
1353
+ "grad_norm": 0.08154450356960297,
1354
+ "learning_rate": 4.506726457399103e-06,
1355
+ "loss": 0.0155,
1356
+ "num_input_tokens_seen": 52754432,
1357
+ "step": 1610
1358
+ },
1359
+ {
1360
+ "epoch": 1.7238627294493216,
1361
+ "grad_norm": 0.030162209644913673,
1362
+ "learning_rate": 4.338565022421525e-06,
1363
+ "loss": 0.0087,
1364
+ "num_input_tokens_seen": 53082112,
1365
+ "step": 1620
1366
+ },
1367
+ {
1368
+ "epoch": 1.7345038574088854,
1369
+ "grad_norm": 0.058421239256858826,
1370
+ "learning_rate": 4.170403587443946e-06,
1371
+ "loss": 0.0205,
1372
+ "num_input_tokens_seen": 53409792,
1373
+ "step": 1630
1374
+ },
1375
+ {
1376
+ "epoch": 1.745144985368449,
1377
+ "grad_norm": 0.9610540270805359,
1378
+ "learning_rate": 4.002242152466368e-06,
1379
+ "loss": 0.0084,
1380
+ "num_input_tokens_seen": 53737472,
1381
+ "step": 1640
1382
+ },
1383
+ {
1384
+ "epoch": 1.7557861133280128,
1385
+ "grad_norm": 0.3001765310764313,
1386
+ "learning_rate": 3.834080717488789e-06,
1387
+ "loss": 0.0154,
1388
+ "num_input_tokens_seen": 54065152,
1389
+ "step": 1650
1390
+ },
1391
+ {
1392
+ "epoch": 1.7664272412875763,
1393
+ "grad_norm": 0.07005713880062103,
1394
+ "learning_rate": 3.665919282511211e-06,
1395
+ "loss": 0.0166,
1396
+ "num_input_tokens_seen": 54392832,
1397
+ "step": 1660
1398
+ },
1399
+ {
1400
+ "epoch": 1.7770683692471403,
1401
+ "grad_norm": 0.044125888496637344,
1402
+ "learning_rate": 3.4977578475336323e-06,
1403
+ "loss": 0.0016,
1404
+ "num_input_tokens_seen": 54720512,
1405
+ "step": 1670
1406
+ },
1407
+ {
1408
+ "epoch": 1.7877094972067038,
1409
+ "grad_norm": 1.5570340156555176,
1410
+ "learning_rate": 3.329596412556054e-06,
1411
+ "loss": 0.0208,
1412
+ "num_input_tokens_seen": 55048192,
1413
+ "step": 1680
1414
+ },
1415
+ {
1416
+ "epoch": 1.7983506251662678,
1417
+ "grad_norm": 0.12797504663467407,
1418
+ "learning_rate": 3.1614349775784753e-06,
1419
+ "loss": 0.0127,
1420
+ "num_input_tokens_seen": 55375872,
1421
+ "step": 1690
1422
+ },
1423
+ {
1424
+ "epoch": 1.8089917531258313,
1425
+ "grad_norm": 0.12429122626781464,
1426
+ "learning_rate": 2.9932735426008968e-06,
1427
+ "loss": 0.0015,
1428
+ "num_input_tokens_seen": 55703552,
1429
+ "step": 1700
1430
+ },
1431
+ {
1432
+ "epoch": 1.819632881085395,
1433
+ "grad_norm": 0.15149074792861938,
1434
+ "learning_rate": 2.8251121076233182e-06,
1435
+ "loss": 0.0083,
1436
+ "num_input_tokens_seen": 56031232,
1437
+ "step": 1710
1438
+ },
1439
+ {
1440
+ "epoch": 1.8302740090449587,
1441
+ "grad_norm": 0.10725903511047363,
1442
+ "learning_rate": 2.65695067264574e-06,
1443
+ "loss": 0.0071,
1444
+ "num_input_tokens_seen": 56358912,
1445
+ "step": 1720
1446
+ },
1447
+ {
1448
+ "epoch": 1.8409151370045225,
1449
+ "grad_norm": 0.1267658919095993,
1450
+ "learning_rate": 2.4887892376681616e-06,
1451
+ "loss": 0.0087,
1452
+ "num_input_tokens_seen": 56686592,
1453
+ "step": 1730
1454
+ },
1455
+ {
1456
+ "epoch": 1.8515562649640862,
1457
+ "grad_norm": 0.35703355073928833,
1458
+ "learning_rate": 2.320627802690583e-06,
1459
+ "loss": 0.0068,
1460
+ "num_input_tokens_seen": 57014272,
1461
+ "step": 1740
1462
+ },
1463
+ {
1464
+ "epoch": 1.86219739292365,
1465
+ "grad_norm": 0.7102775573730469,
1466
+ "learning_rate": 2.1524663677130046e-06,
1467
+ "loss": 0.0237,
1468
+ "num_input_tokens_seen": 57341952,
1469
+ "step": 1750
1470
+ },
1471
+ {
1472
+ "epoch": 1.86219739292365,
1473
+ "eval_accuracy": 0.996,
1474
+ "eval_loss": 0.018471572548151016,
1475
+ "eval_runtime": 1.1252,
1476
+ "eval_samples_per_second": 444.377,
1477
+ "eval_steps_per_second": 55.992,
1478
+ "num_input_tokens_seen": 57341952,
1479
+ "step": 1750
1480
+ },
1481
+ {
1482
+ "epoch": 1.8728385208832137,
1483
+ "grad_norm": 0.04301352798938751,
1484
+ "learning_rate": 1.984304932735426e-06,
1485
+ "loss": 0.0102,
1486
+ "num_input_tokens_seen": 57669632,
1487
+ "step": 1760
1488
+ },
1489
+ {
1490
+ "epoch": 1.8834796488427772,
1491
+ "grad_norm": 0.12998220324516296,
1492
+ "learning_rate": 1.8161434977578476e-06,
1493
+ "loss": 0.034,
1494
+ "num_input_tokens_seen": 57997312,
1495
+ "step": 1770
1496
+ },
1497
+ {
1498
+ "epoch": 1.8941207768023411,
1499
+ "grad_norm": 0.05428827181458473,
1500
+ "learning_rate": 1.647982062780269e-06,
1501
+ "loss": 0.0034,
1502
+ "num_input_tokens_seen": 58324992,
1503
+ "step": 1780
1504
+ },
1505
+ {
1506
+ "epoch": 1.9047619047619047,
1507
+ "grad_norm": 0.031001785770058632,
1508
+ "learning_rate": 1.4798206278026905e-06,
1509
+ "loss": 0.0201,
1510
+ "num_input_tokens_seen": 58652672,
1511
+ "step": 1790
1512
+ },
1513
+ {
1514
+ "epoch": 1.9154030327214686,
1515
+ "grad_norm": 0.06974712759256363,
1516
+ "learning_rate": 1.3116591928251122e-06,
1517
+ "loss": 0.0073,
1518
+ "num_input_tokens_seen": 58980352,
1519
+ "step": 1800
1520
+ },
1521
+ {
1522
+ "epoch": 1.9260441606810321,
1523
+ "grad_norm": 0.028872903436422348,
1524
+ "learning_rate": 1.1434977578475337e-06,
1525
+ "loss": 0.0201,
1526
+ "num_input_tokens_seen": 59308032,
1527
+ "step": 1810
1528
+ },
1529
+ {
1530
+ "epoch": 1.9366852886405959,
1531
+ "grad_norm": 0.04791630432009697,
1532
+ "learning_rate": 9.75336322869955e-07,
1533
+ "loss": 0.0365,
1534
+ "num_input_tokens_seen": 59635712,
1535
+ "step": 1820
1536
+ },
1537
+ {
1538
+ "epoch": 1.9473264166001596,
1539
+ "grad_norm": 0.9329636096954346,
1540
+ "learning_rate": 8.071748878923768e-07,
1541
+ "loss": 0.0041,
1542
+ "num_input_tokens_seen": 59963392,
1543
+ "step": 1830
1544
+ },
1545
+ {
1546
+ "epoch": 1.9579675445597233,
1547
+ "grad_norm": 0.2609878182411194,
1548
+ "learning_rate": 6.390134529147982e-07,
1549
+ "loss": 0.0191,
1550
+ "num_input_tokens_seen": 60291072,
1551
+ "step": 1840
1552
+ },
1553
+ {
1554
+ "epoch": 1.968608672519287,
1555
+ "grad_norm": 1.2760034799575806,
1556
+ "learning_rate": 4.7085201793721974e-07,
1557
+ "loss": 0.006,
1558
+ "num_input_tokens_seen": 60618752,
1559
+ "step": 1850
1560
+ },
1561
+ {
1562
+ "epoch": 1.9792498004788508,
1563
+ "grad_norm": 0.5698215961456299,
1564
+ "learning_rate": 3.026905829596413e-07,
1565
+ "loss": 0.0106,
1566
+ "num_input_tokens_seen": 60946432,
1567
+ "step": 1860
1568
+ },
1569
+ {
1570
+ "epoch": 1.9898909284384145,
1571
+ "grad_norm": 0.08324664831161499,
1572
+ "learning_rate": 1.345291479820628e-07,
1573
+ "loss": 0.0096,
1574
+ "num_input_tokens_seen": 61274112,
1575
+ "step": 1870
1576
+ },
1577
+ {
1578
+ "epoch": 1.9984038308060654,
1579
+ "num_input_tokens_seen": 61536256,
1580
+ "step": 1878,
1581
+ "total_flos": 3986132331896832.0,
1582
+ "train_loss": 0.0478181641492338,
1583
+ "train_runtime": 541.4691,
1584
+ "train_samples_per_second": 222.136,
1585
+ "train_steps_per_second": 3.468
1586
+ }
1587
+ ],
1588
+ "logging_steps": 10,
1589
+ "max_steps": 1878,
1590
+ "num_input_tokens_seen": 61536256,
1591
+ "num_train_epochs": 2,
1592
+ "save_steps": 400,
1593
+ "total_flos": 3986132331896832.0,
1594
+ "train_batch_size": 8,
1595
+ "trial_name": null,
1596
+ "trial_params": null
1597
+ }