Commit
·
da17ed3
1
Parent(s):
068fb88
new data
Browse files- logs/main_log.txt +54 -0
logs/main_log.txt
CHANGED
@@ -106523,3 +106523,57 @@ time (ms)
|
|
106523 |
time (ms)
|
106524 |
iteration 2516/ 292968 | consumed samples: 5152768 | consumed tokens: 666812416 | elapsed time per iteration (ms): 126860.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.729679E+00 | loss scale: 131072.0 | grad norm: 70559.533 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106525 |
time (ms)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
106523 |
time (ms)
|
106524 |
iteration 2516/ 292968 | consumed samples: 5152768 | consumed tokens: 666812416 | elapsed time per iteration (ms): 126860.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.729679E+00 | loss scale: 131072.0 | grad norm: 70559.533 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106525 |
time (ms)
|
106526 |
+
iteration 2517/ 292968 | consumed samples: 5154816 | consumed tokens: 667222016 | elapsed time per iteration (ms): 131649.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.677740E+00 | loss scale: 131072.0 | grad norm: 56023.485 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106527 |
+
time (ms)
|
106528 |
+
iteration 2518/ 292968 | consumed samples: 5156864 | consumed tokens: 667631616 | elapsed time per iteration (ms): 133698.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.696238E+00 | loss scale: 131072.0 | grad norm: 57083.392 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106529 |
+
time (ms)
|
106530 |
+
iteration 2519/ 292968 | consumed samples: 5158912 | consumed tokens: 668041216 | elapsed time per iteration (ms): 133130.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.712332E+00 | loss scale: 131072.0 | grad norm: 66522.220 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106531 |
+
time (ms)
|
106532 |
+
iteration 2520/ 292968 | consumed samples: 5160960 | consumed tokens: 668450816 | elapsed time per iteration (ms): 132776.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.699082E+00 | loss scale: 131072.0 | grad norm: 52981.553 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106533 |
+
time (ms)
|
106534 |
+
iteration 2521/ 292968 | consumed samples: 5163008 | consumed tokens: 668860416 | elapsed time per iteration (ms): 133609.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.677561E+00 | loss scale: 131072.0 | grad norm: 49201.207 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106535 |
+
time (ms)
|
106536 |
+
iteration 2522/ 292968 | consumed samples: 5165056 | consumed tokens: 669270016 | elapsed time per iteration (ms): 134264.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.699126E+00 | loss scale: 131072.0 | grad norm: 38187.609 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106537 |
+
time (ms)
|
106538 |
+
iteration 2523/ 292968 | consumed samples: 5167104 | consumed tokens: 669679616 | elapsed time per iteration (ms): 133050.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.711785E+00 | loss scale: 131072.0 | grad norm: 50523.507 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106539 |
+
time (ms)
|
106540 |
+
iteration 2524/ 292968 | consumed samples: 5169152 | consumed tokens: 670089216 | elapsed time per iteration (ms): 129836.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.695742E+00 | loss scale: 131072.0 | grad norm: 54330.129 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106541 |
+
time (ms)
|
106542 |
+
iteration 2525/ 292968 | consumed samples: 5171200 | consumed tokens: 670498816 | elapsed time per iteration (ms): 136356.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.732819E+00 | loss scale: 131072.0 | grad norm: 39968.544 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106543 |
+
time (ms)
|
106544 |
+
iteration 2526/ 292968 | consumed samples: 5173248 | consumed tokens: 670908416 | elapsed time per iteration (ms): 134571.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.712886E+00 | loss scale: 131072.0 | grad norm: 51363.977 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106545 |
+
time (ms)
|
106546 |
+
iteration 2527/ 292968 | consumed samples: 5175296 | consumed tokens: 671318016 | elapsed time per iteration (ms): 132047.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.695562E+00 | loss scale: 131072.0 | grad norm: 51765.676 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106547 |
+
time (ms)
|
106548 |
+
iteration 2528/ 292968 | consumed samples: 5177344 | consumed tokens: 671727616 | elapsed time per iteration (ms): 134158.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.707468E+00 | loss scale: 131072.0 | grad norm: 54323.308 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106549 |
+
time (ms)
|
106550 |
+
iteration 2529/ 292968 | consumed samples: 5179392 | consumed tokens: 672137216 | elapsed time per iteration (ms): 131022.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.695577E+00 | loss scale: 131072.0 | grad norm: 41546.541 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106551 |
+
time (ms)
|
106552 |
+
iteration 2530/ 292968 | consumed samples: 5181440 | consumed tokens: 672546816 | elapsed time per iteration (ms): 137329.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.701566E+00 | loss scale: 131072.0 | grad norm: 42285.909 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106553 |
+
time (ms)
|
106554 |
+
iteration 2531/ 292968 | consumed samples: 5183488 | consumed tokens: 672956416 | elapsed time per iteration (ms): 135951.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.682348E+00 | loss scale: 131072.0 | grad norm: 55894.421 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106555 |
+
time (ms)
|
106556 |
+
iteration 2532/ 292968 | consumed samples: 5185536 | consumed tokens: 673366016 | elapsed time per iteration (ms): 134684.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.719293E+00 | loss scale: 131072.0 | grad norm: 64429.092 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106557 |
+
time (ms)
|
106558 |
+
iteration 2533/ 292968 | consumed samples: 5187584 | consumed tokens: 673775616 | elapsed time per iteration (ms): 139215.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.718337E+00 | loss scale: 131072.0 | grad norm: 49058.682 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106559 |
+
time (ms)
|
106560 |
+
iteration 2534/ 292968 | consumed samples: 5189632 | consumed tokens: 674185216 | elapsed time per iteration (ms): 140178.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.713611E+00 | loss scale: 131072.0 | grad norm: 66713.209 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106561 |
+
time (ms)
|
106562 |
+
iteration 2535/ 292968 | consumed samples: 5191680 | consumed tokens: 674594816 | elapsed time per iteration (ms): 137068.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.720226E+00 | loss scale: 131072.0 | grad norm: 70072.153 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106563 |
+
time (ms)
|
106564 |
+
iteration 2536/ 292968 | consumed samples: 5193728 | consumed tokens: 675004416 | elapsed time per iteration (ms): 133750.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.686594E+00 | loss scale: 131072.0 | grad norm: 47463.962 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106565 |
+
time (ms)
|
106566 |
+
iteration 2537/ 292968 | consumed samples: 5195776 | consumed tokens: 675414016 | elapsed time per iteration (ms): 134502.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.718798E+00 | loss scale: 131072.0 | grad norm: 75553.129 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106567 |
+
time (ms)
|
106568 |
+
iteration 2538/ 292968 | consumed samples: 5197824 | consumed tokens: 675823616 | elapsed time per iteration (ms): 132873.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.676980E+00 | loss scale: 131072.0 | grad norm: 72938.459 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106569 |
+
time (ms)
|
106570 |
+
iteration 2539/ 292968 | consumed samples: 5199872 | consumed tokens: 676233216 | elapsed time per iteration (ms): 136842.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.690705E+00 | loss scale: 131072.0 | grad norm: 63805.103 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106571 |
+
time (ms)
|
106572 |
+
iteration 2540/ 292968 | consumed samples: 5201920 | consumed tokens: 676642816 | elapsed time per iteration (ms): 138332.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.699246E+00 | loss scale: 131072.0 | grad norm: 60131.574 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106573 |
+
time (ms)
|
106574 |
+
iteration 2541/ 292968 | consumed samples: 5203968 | consumed tokens: 677052416 | elapsed time per iteration (ms): 137209.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.687837E+00 | loss scale: 131072.0 | grad norm: 57555.686 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106575 |
+
time (ms)
|
106576 |
+
iteration 2542/ 292968 | consumed samples: 5206016 | consumed tokens: 677462016 | elapsed time per iteration (ms): 135834.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.714507E+00 | loss scale: 131072.0 | grad norm: 56971.731 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106577 |
+
time (ms)
|
106578 |
+
iteration 2543/ 292968 | consumed samples: 5208064 | consumed tokens: 677871616 | elapsed time per iteration (ms): 133073.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.731538E+00 | loss scale: 131072.0 | grad norm: 53881.397 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106579 |
+
time (ms)
|