bigscience-bot commited on
Commit
d8d084f
1 Parent(s): 60a143e
Files changed (1) hide show
  1. logs/main_log.txt +66 -0
logs/main_log.txt CHANGED
@@ -124812,3 +124812,69 @@ time (ms)
124812
  time (ms)
124813
  iteration 3233/ 292968 | consumed samples: 6621184 | consumed tokens: 987545600 | elapsed time per iteration (ms): 111505.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.527050E+00 | loss scale: 131072.0 | grad norm: 37932.338 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124814
  time (ms)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
124812
  time (ms)
124813
  iteration 3233/ 292968 | consumed samples: 6621184 | consumed tokens: 987545600 | elapsed time per iteration (ms): 111505.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.527050E+00 | loss scale: 131072.0 | grad norm: 37932.338 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124814
  time (ms)
124815
+ iteration 3234/ 292968 | consumed samples: 6623232 | consumed tokens: 988037120 | elapsed time per iteration (ms): 110767.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.520366E+00 | loss scale: 131072.0 | grad norm: 36172.854 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124816
+ time (ms)
124817
+ iteration 3235/ 292968 | consumed samples: 6625280 | consumed tokens: 988528640 | elapsed time per iteration (ms): 110564.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.498188E+00 | loss scale: 131072.0 | grad norm: 37528.273 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124818
+ time (ms)
124819
+ iteration 3236/ 292968 | consumed samples: 6627328 | consumed tokens: 989020160 | elapsed time per iteration (ms): 112747.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.517138E+00 | loss scale: 131072.0 | grad norm: 43856.052 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124820
+ time (ms)
124821
+ iteration 3237/ 292968 | consumed samples: 6629376 | consumed tokens: 989511680 | elapsed time per iteration (ms): 112824.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.533917E+00 | loss scale: 131072.0 | grad norm: 36516.059 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124822
+ time (ms)
124823
+ iteration 3238/ 292968 | consumed samples: 6631424 | consumed tokens: 990003200 | elapsed time per iteration (ms): 111098.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.498120E+00 | loss scale: 131072.0 | grad norm: 41361.365 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124824
+ time (ms)
124825
+ iteration 3239/ 292968 | consumed samples: 6633472 | consumed tokens: 990494720 | elapsed time per iteration (ms): 112615.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.509820E+00 | loss scale: 131072.0 | grad norm: 62598.160 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124826
+ time (ms)
124827
+ iteration 3240/ 292968 | consumed samples: 6635520 | consumed tokens: 990986240 | elapsed time per iteration (ms): 110191.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.483434E+00 | loss scale: 131072.0 | grad norm: 55741.853 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124828
+ time (ms)
124829
+ iteration 3241/ 292968 | consumed samples: 6637568 | consumed tokens: 991477760 | elapsed time per iteration (ms): 112055.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.534761E+00 | loss scale: 131072.0 | grad norm: 40162.102 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124830
+ time (ms)
124831
+ iteration 3242/ 292968 | consumed samples: 6639616 | consumed tokens: 991969280 | elapsed time per iteration (ms): 110944.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.518108E+00 | loss scale: 131072.0 | grad norm: 43256.029 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124832
+ time (ms)
124833
+ iteration 3243/ 292968 | consumed samples: 6641664 | consumed tokens: 992460800 | elapsed time per iteration (ms): 110612.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.505326E+00 | loss scale: 131072.0 | grad norm: 34049.259 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124834
+ time (ms)
124835
+ iteration 3244/ 292968 | consumed samples: 6643712 | consumed tokens: 992952320 | elapsed time per iteration (ms): 110996.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.469863E+00 | loss scale: 131072.0 | grad norm: 39566.431 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124836
+ time (ms)
124837
+ iteration 3245/ 292968 | consumed samples: 6645760 | consumed tokens: 993443840 | elapsed time per iteration (ms): 110711.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.496788E+00 | loss scale: 131072.0 | grad norm: 41042.556 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124838
+ time (ms)
124839
+ iteration 3246/ 292968 | consumed samples: 6647808 | consumed tokens: 993935360 | elapsed time per iteration (ms): 112623.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.505213E+00 | loss scale: 131072.0 | grad norm: 41738.715 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124840
+ time (ms)
124841
+ iteration 3247/ 292968 | consumed samples: 6649856 | consumed tokens: 994426880 | elapsed time per iteration (ms): 111326.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.486457E+00 | loss scale: 131072.0 | grad norm: 39878.696 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124842
+ time (ms)
124843
+ iteration 3248/ 292968 | consumed samples: 6651904 | consumed tokens: 994918400 | elapsed time per iteration (ms): 112574.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.469687E+00 | loss scale: 131072.0 | grad norm: 33922.146 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124844
+ time (ms)
124845
+ iteration 3249/ 292968 | consumed samples: 6653952 | consumed tokens: 995409920 | elapsed time per iteration (ms): 110648.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.471512E+00 | loss scale: 131072.0 | grad norm: 33383.898 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124846
+ time (ms)
124847
+ iteration 3250/ 292968 | consumed samples: 6656000 | consumed tokens: 995901440 | elapsed time per iteration (ms): 112336.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.539900E+00 | loss scale: 131072.0 | grad norm: 35349.622 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124848
+ time (ms)
124849
+ iteration 3251/ 292968 | consumed samples: 6658048 | consumed tokens: 996392960 | elapsed time per iteration (ms): 110933.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.479738E+00 | loss scale: 131072.0 | grad norm: 38928.251 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124850
+ time (ms)
124851
+ iteration 3252/ 292968 | consumed samples: 6660096 | consumed tokens: 996884480 | elapsed time per iteration (ms): 110855.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.492504E+00 | loss scale: 131072.0 | grad norm: 35033.057 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124852
+ time (ms)
124853
+ iteration 3253/ 292968 | consumed samples: 6662144 | consumed tokens: 997376000 | elapsed time per iteration (ms): 112026.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.490637E+00 | loss scale: 131072.0 | grad norm: 39358.440 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124854
+ time (ms)
124855
+ iteration 3254/ 292968 | consumed samples: 6664192 | consumed tokens: 997867520 | elapsed time per iteration (ms): 110400.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.475244E+00 | loss scale: 131072.0 | grad norm: 38281.132 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124856
+ time (ms)
124857
+ iteration 3255/ 292968 | consumed samples: 6666240 | consumed tokens: 998359040 | elapsed time per iteration (ms): 111582.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.512143E+00 | loss scale: 131072.0 | grad norm: 41348.295 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124858
+ time (ms)
124859
+ iteration 3256/ 292968 | consumed samples: 6668288 | consumed tokens: 998850560 | elapsed time per iteration (ms): 114295.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.511817E+00 | loss scale: 131072.0 | grad norm: 51514.910 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124860
+ time (ms)
124861
+ iteration 3257/ 292968 | consumed samples: 6670336 | consumed tokens: 999342080 | elapsed time per iteration (ms): 114201.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.490412E+00 | loss scale: 131072.0 | grad norm: 50568.937 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124862
+ time (ms)
124863
+ iteration 3258/ 292968 | consumed samples: 6672384 | consumed tokens: 999833600 | elapsed time per iteration (ms): 112390.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.471197E+00 | loss scale: 131072.0 | grad norm: 48187.013 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124864
+ time (ms)
124865
+ iteration 3259/ 292968 | consumed samples: 6674432 | consumed tokens: 1000325120 | elapsed time per iteration (ms): 112710.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.479379E+00 | loss scale: 131072.0 | grad norm: 40458.309 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124866
+ time (ms)
124867
+ iteration 3260/ 292968 | consumed samples: 6676480 | consumed tokens: 1000816640 | elapsed time per iteration (ms): 113180.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.473989E+00 | loss scale: 131072.0 | grad norm: 33616.338 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124868
+ time (ms)
124869
+ iteration 3261/ 292968 | consumed samples: 6678528 | consumed tokens: 1001308160 | elapsed time per iteration (ms): 111586.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.498321E+00 | loss scale: 131072.0 | grad norm: 46783.611 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124870
+ time (ms)
124871
+ iteration 3262/ 292968 | consumed samples: 6680576 | consumed tokens: 1001799680 | elapsed time per iteration (ms): 111004.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.508714E+00 | loss scale: 131072.0 | grad norm: 45630.732 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124872
+ time (ms)
124873
+ iteration 3263/ 292968 | consumed samples: 6682624 | consumed tokens: 1002291200 | elapsed time per iteration (ms): 112232.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.504247E+00 | loss scale: 131072.0 | grad norm: 56956.550 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124874
+ time (ms)
124875
+ iteration 3264/ 292968 | consumed samples: 6684672 | consumed tokens: 1002782720 | elapsed time per iteration (ms): 114908.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.483163E+00 | loss scale: 131072.0 | grad norm: 60085.040 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124876
+ time (ms)
124877
+ iteration 3265/ 292968 | consumed samples: 6686720 | consumed tokens: 1003274240 | elapsed time per iteration (ms): 113398.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.485216E+00 | loss scale: 131072.0 | grad norm: 50799.684 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124878
+ time (ms)
124879
+ iteration 3266/ 292968 | consumed samples: 6688768 | consumed tokens: 1003765760 | elapsed time per iteration (ms): 111087.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.498115E+00 | loss scale: 131072.0 | grad norm: 40484.037 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
124880
+ time (ms)