bigscience-bot commited on
Commit
0a86c3e
1 Parent(s): 18ed1f8
Files changed (1) hide show
  1. logs/main_log.txt +64 -0
logs/main_log.txt CHANGED
@@ -124878,3 +124878,67 @@ time (ms)
124878
  time (ms)
124879
  iteration 3266/ 292968 | consumed samples: 6688768 | consumed tokens: 1003765760 | elapsed time per iteration (ms): 111087.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.498115E+00 | loss scale: 131072.0 | grad norm: 40484.037 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124880
  time (ms)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
124878
  time (ms)
124879
  iteration 3266/ 292968 | consumed samples: 6688768 | consumed tokens: 1003765760 | elapsed time per iteration (ms): 111087.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.498115E+00 | loss scale: 131072.0 | grad norm: 40484.037 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124880
  time (ms)
124881
+ iteration 3267/ 292968 | consumed samples: 6690816 | consumed tokens: 1004257280 | elapsed time per iteration (ms): 112981.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.485603E+00 | loss scale: 131072.0 | grad norm: 35172.870 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124882
+ time (ms)
124883
+ iteration 3268/ 292968 | consumed samples: 6692864 | consumed tokens: 1004748800 | elapsed time per iteration (ms): 112046.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.503227E+00 | loss scale: 131072.0 | grad norm: 36791.981 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124884
+ time (ms)
124885
+ iteration 3269/ 292968 | consumed samples: 6694912 | consumed tokens: 1005240320 | elapsed time per iteration (ms): 110197.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.492297E+00 | loss scale: 131072.0 | grad norm: 39721.467 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124886
+ time (ms)
124887
+ iteration 3270/ 292968 | consumed samples: 6696960 | consumed tokens: 1005731840 | elapsed time per iteration (ms): 110041.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.465833E+00 | loss scale: 131072.0 | grad norm: 41592.190 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124888
+ time (ms)
124889
+ iteration 3271/ 292968 | consumed samples: 6699008 | consumed tokens: 1006223360 | elapsed time per iteration (ms): 110297.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.520051E+00 | loss scale: 131072.0 | grad norm: 38770.837 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124890
+ time (ms)
124891
+ iteration 3272/ 292968 | consumed samples: 6701056 | consumed tokens: 1006714880 | elapsed time per iteration (ms): 113682.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.532229E+00 | loss scale: 131072.0 | grad norm: 46863.674 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124892
+ time (ms)
124893
+ iteration 3273/ 292968 | consumed samples: 6703104 | consumed tokens: 1007206400 | elapsed time per iteration (ms): 115764.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.487801E+00 | loss scale: 131072.0 | grad norm: 47275.617 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124894
+ time (ms)
124895
+ iteration 3274/ 292968 | consumed samples: 6705152 | consumed tokens: 1007697920 | elapsed time per iteration (ms): 113611.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.499582E+00 | loss scale: 131072.0 | grad norm: 43028.621 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124896
+ time (ms)
124897
+ iteration 3275/ 292968 | consumed samples: 6707200 | consumed tokens: 1008189440 | elapsed time per iteration (ms): 111135.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.499293E+00 | loss scale: 131072.0 | grad norm: 43217.821 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124898
+ time (ms)
124899
+ iteration 3276/ 292968 | consumed samples: 6709248 | consumed tokens: 1008680960 | elapsed time per iteration (ms): 111398.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.495284E+00 | loss scale: 131072.0 | grad norm: 35376.715 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124900
+ time (ms)
124901
+ iteration 3277/ 292968 | consumed samples: 6711296 | consumed tokens: 1009172480 | elapsed time per iteration (ms): 112414.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.483550E+00 | loss scale: 131072.0 | grad norm: 34250.645 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124902
+ time (ms)
124903
+ iteration 3278/ 292968 | consumed samples: 6713344 | consumed tokens: 1009664000 | elapsed time per iteration (ms): 111344.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.486138E+00 | loss scale: 131072.0 | grad norm: 30434.955 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124904
+ time (ms)
124905
+ iteration 3279/ 292968 | consumed samples: 6715392 | consumed tokens: 1010155520 | elapsed time per iteration (ms): 112060.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.498874E+00 | loss scale: 131072.0 | grad norm: 29348.389 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124906
+ time (ms)
124907
+ iteration 3280/ 292968 | consumed samples: 6717440 | consumed tokens: 1010647040 | elapsed time per iteration (ms): 112820.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.497196E+00 | loss scale: 131072.0 | grad norm: 29673.133 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124908
+ time (ms)
124909
+ iteration 3281/ 292968 | consumed samples: 6719488 | consumed tokens: 1011138560 | elapsed time per iteration (ms): 111234.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.499080E+00 | loss scale: 131072.0 | grad norm: 40415.963 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124910
+ time (ms)
124911
+ iteration 3282/ 292968 | consumed samples: 6721536 | consumed tokens: 1011630080 | elapsed time per iteration (ms): 111552.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.491541E+00 | loss scale: 131072.0 | grad norm: 57029.381 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124912
+ time (ms)
124913
+ iteration 3283/ 292968 | consumed samples: 6723584 | consumed tokens: 1012121600 | elapsed time per iteration (ms): 112426.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.497360E+00 | loss scale: 131072.0 | grad norm: 59242.468 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124914
+ time (ms)
124915
+ iteration 3284/ 292968 | consumed samples: 6725632 | consumed tokens: 1012613120 | elapsed time per iteration (ms): 111149.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.476556E+00 | loss scale: 131072.0 | grad norm: 45191.526 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124916
+ time (ms)
124917
+ iteration 3285/ 292968 | consumed samples: 6727680 | consumed tokens: 1013104640 | elapsed time per iteration (ms): 113840.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.492275E+00 | loss scale: 131072.0 | grad norm: 36899.796 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124918
+ time (ms)
124919
+ iteration 3286/ 292968 | consumed samples: 6729728 | consumed tokens: 1013596160 | elapsed time per iteration (ms): 113981.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.471767E+00 | loss scale: 131072.0 | grad norm: 42014.104 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124920
+ time (ms)
124921
+ iteration 3287/ 292968 | consumed samples: 6731776 | consumed tokens: 1014087680 | elapsed time per iteration (ms): 113840.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.475223E+00 | loss scale: 131072.0 | grad norm: 45709.099 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124922
+ time (ms)
124923
+ iteration 3288/ 292968 | consumed samples: 6733824 | consumed tokens: 1014579200 | elapsed time per iteration (ms): 112154.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.503000E+00 | loss scale: 131072.0 | grad norm: 46516.672 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124924
+ time (ms)
124925
+ iteration 3289/ 292968 | consumed samples: 6735872 | consumed tokens: 1015070720 | elapsed time per iteration (ms): 110548.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.484241E+00 | loss scale: 131072.0 | grad norm: 37206.769 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124926
+ time (ms)
124927
+ iteration 3290/ 292968 | consumed samples: 6737920 | consumed tokens: 1015562240 | elapsed time per iteration (ms): 112012.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.478825E+00 | loss scale: 131072.0 | grad norm: 39774.517 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124928
+ time (ms)
124929
+ iteration 3291/ 292968 | consumed samples: 6739968 | consumed tokens: 1016053760 | elapsed time per iteration (ms): 110410.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.495184E+00 | loss scale: 131072.0 | grad norm: 38254.934 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124930
+ time (ms)
124931
+ iteration 3292/ 292968 | consumed samples: 6742016 | consumed tokens: 1016545280 | elapsed time per iteration (ms): 111588.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.488030E+00 | loss scale: 131072.0 | grad norm: 43122.399 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124932
+ time (ms)
124933
+ iteration 3293/ 292968 | consumed samples: 6744064 | consumed tokens: 1017036800 | elapsed time per iteration (ms): 110742.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.512937E+00 | loss scale: 131072.0 | grad norm: 42031.635 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124934
+ time (ms)
124935
+ iteration 3294/ 292968 | consumed samples: 6746112 | consumed tokens: 1017528320 | elapsed time per iteration (ms): 112447.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.472189E+00 | loss scale: 131072.0 | grad norm: 44968.571 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124936
+ time (ms)
124937
+ iteration 3295/ 292968 | consumed samples: 6748160 | consumed tokens: 1018019840 | elapsed time per iteration (ms): 111572.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.486163E+00 | loss scale: 131072.0 | grad norm: 46456.832 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124938
+ time (ms)
124939
+ iteration 3296/ 292968 | consumed samples: 6750208 | consumed tokens: 1018511360 | elapsed time per iteration (ms): 112407.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.476424E+00 | loss scale: 131072.0 | grad norm: 36053.245 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124940
+ time (ms)
124941
+ iteration 3297/ 292968 | consumed samples: 6752256 | consumed tokens: 1019002880 | elapsed time per iteration (ms): 111913.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.471766E+00 | loss scale: 131072.0 | grad norm: 44322.924 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124942
+ time (ms)
124943
+ iteration 3298/ 292968 | consumed samples: 6754304 | consumed tokens: 1019494400 | elapsed time per iteration (ms): 112625.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.473320E+00 | loss scale: 131072.0 | grad norm: 50050.388 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124944
+ time (ms)