bigscience-bot
commited on
Commit
•
d8d084f
1
Parent(s):
60a143e
new data
Browse files- logs/main_log.txt +66 -0
logs/main_log.txt
CHANGED
@@ -124812,3 +124812,69 @@ time (ms)
|
|
124812 |
time (ms)
|
124813 |
iteration 3233/ 292968 | consumed samples: 6621184 | consumed tokens: 987545600 | elapsed time per iteration (ms): 111505.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.527050E+00 | loss scale: 131072.0 | grad norm: 37932.338 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124814 |
time (ms)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
124812 |
time (ms)
|
124813 |
iteration 3233/ 292968 | consumed samples: 6621184 | consumed tokens: 987545600 | elapsed time per iteration (ms): 111505.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.527050E+00 | loss scale: 131072.0 | grad norm: 37932.338 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124814 |
time (ms)
|
124815 |
+
iteration 3234/ 292968 | consumed samples: 6623232 | consumed tokens: 988037120 | elapsed time per iteration (ms): 110767.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.520366E+00 | loss scale: 131072.0 | grad norm: 36172.854 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124816 |
+
time (ms)
|
124817 |
+
iteration 3235/ 292968 | consumed samples: 6625280 | consumed tokens: 988528640 | elapsed time per iteration (ms): 110564.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.498188E+00 | loss scale: 131072.0 | grad norm: 37528.273 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124818 |
+
time (ms)
|
124819 |
+
iteration 3236/ 292968 | consumed samples: 6627328 | consumed tokens: 989020160 | elapsed time per iteration (ms): 112747.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.517138E+00 | loss scale: 131072.0 | grad norm: 43856.052 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124820 |
+
time (ms)
|
124821 |
+
iteration 3237/ 292968 | consumed samples: 6629376 | consumed tokens: 989511680 | elapsed time per iteration (ms): 112824.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.533917E+00 | loss scale: 131072.0 | grad norm: 36516.059 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124822 |
+
time (ms)
|
124823 |
+
iteration 3238/ 292968 | consumed samples: 6631424 | consumed tokens: 990003200 | elapsed time per iteration (ms): 111098.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.498120E+00 | loss scale: 131072.0 | grad norm: 41361.365 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124824 |
+
time (ms)
|
124825 |
+
iteration 3239/ 292968 | consumed samples: 6633472 | consumed tokens: 990494720 | elapsed time per iteration (ms): 112615.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.509820E+00 | loss scale: 131072.0 | grad norm: 62598.160 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124826 |
+
time (ms)
|
124827 |
+
iteration 3240/ 292968 | consumed samples: 6635520 | consumed tokens: 990986240 | elapsed time per iteration (ms): 110191.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.483434E+00 | loss scale: 131072.0 | grad norm: 55741.853 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124828 |
+
time (ms)
|
124829 |
+
iteration 3241/ 292968 | consumed samples: 6637568 | consumed tokens: 991477760 | elapsed time per iteration (ms): 112055.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.534761E+00 | loss scale: 131072.0 | grad norm: 40162.102 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124830 |
+
time (ms)
|
124831 |
+
iteration 3242/ 292968 | consumed samples: 6639616 | consumed tokens: 991969280 | elapsed time per iteration (ms): 110944.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.518108E+00 | loss scale: 131072.0 | grad norm: 43256.029 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124832 |
+
time (ms)
|
124833 |
+
iteration 3243/ 292968 | consumed samples: 6641664 | consumed tokens: 992460800 | elapsed time per iteration (ms): 110612.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.505326E+00 | loss scale: 131072.0 | grad norm: 34049.259 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124834 |
+
time (ms)
|
124835 |
+
iteration 3244/ 292968 | consumed samples: 6643712 | consumed tokens: 992952320 | elapsed time per iteration (ms): 110996.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.469863E+00 | loss scale: 131072.0 | grad norm: 39566.431 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124836 |
+
time (ms)
|
124837 |
+
iteration 3245/ 292968 | consumed samples: 6645760 | consumed tokens: 993443840 | elapsed time per iteration (ms): 110711.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.496788E+00 | loss scale: 131072.0 | grad norm: 41042.556 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124838 |
+
time (ms)
|
124839 |
+
iteration 3246/ 292968 | consumed samples: 6647808 | consumed tokens: 993935360 | elapsed time per iteration (ms): 112623.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.505213E+00 | loss scale: 131072.0 | grad norm: 41738.715 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124840 |
+
time (ms)
|
124841 |
+
iteration 3247/ 292968 | consumed samples: 6649856 | consumed tokens: 994426880 | elapsed time per iteration (ms): 111326.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.486457E+00 | loss scale: 131072.0 | grad norm: 39878.696 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124842 |
+
time (ms)
|
124843 |
+
iteration 3248/ 292968 | consumed samples: 6651904 | consumed tokens: 994918400 | elapsed time per iteration (ms): 112574.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.469687E+00 | loss scale: 131072.0 | grad norm: 33922.146 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124844 |
+
time (ms)
|
124845 |
+
iteration 3249/ 292968 | consumed samples: 6653952 | consumed tokens: 995409920 | elapsed time per iteration (ms): 110648.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.471512E+00 | loss scale: 131072.0 | grad norm: 33383.898 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124846 |
+
time (ms)
|
124847 |
+
iteration 3250/ 292968 | consumed samples: 6656000 | consumed tokens: 995901440 | elapsed time per iteration (ms): 112336.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.539900E+00 | loss scale: 131072.0 | grad norm: 35349.622 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124848 |
+
time (ms)
|
124849 |
+
iteration 3251/ 292968 | consumed samples: 6658048 | consumed tokens: 996392960 | elapsed time per iteration (ms): 110933.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.479738E+00 | loss scale: 131072.0 | grad norm: 38928.251 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124850 |
+
time (ms)
|
124851 |
+
iteration 3252/ 292968 | consumed samples: 6660096 | consumed tokens: 996884480 | elapsed time per iteration (ms): 110855.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.492504E+00 | loss scale: 131072.0 | grad norm: 35033.057 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124852 |
+
time (ms)
|
124853 |
+
iteration 3253/ 292968 | consumed samples: 6662144 | consumed tokens: 997376000 | elapsed time per iteration (ms): 112026.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.490637E+00 | loss scale: 131072.0 | grad norm: 39358.440 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124854 |
+
time (ms)
|
124855 |
+
iteration 3254/ 292968 | consumed samples: 6664192 | consumed tokens: 997867520 | elapsed time per iteration (ms): 110400.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.475244E+00 | loss scale: 131072.0 | grad norm: 38281.132 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124856 |
+
time (ms)
|
124857 |
+
iteration 3255/ 292968 | consumed samples: 6666240 | consumed tokens: 998359040 | elapsed time per iteration (ms): 111582.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.512143E+00 | loss scale: 131072.0 | grad norm: 41348.295 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124858 |
+
time (ms)
|
124859 |
+
iteration 3256/ 292968 | consumed samples: 6668288 | consumed tokens: 998850560 | elapsed time per iteration (ms): 114295.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.511817E+00 | loss scale: 131072.0 | grad norm: 51514.910 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124860 |
+
time (ms)
|
124861 |
+
iteration 3257/ 292968 | consumed samples: 6670336 | consumed tokens: 999342080 | elapsed time per iteration (ms): 114201.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.490412E+00 | loss scale: 131072.0 | grad norm: 50568.937 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124862 |
+
time (ms)
|
124863 |
+
iteration 3258/ 292968 | consumed samples: 6672384 | consumed tokens: 999833600 | elapsed time per iteration (ms): 112390.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.471197E+00 | loss scale: 131072.0 | grad norm: 48187.013 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124864 |
+
time (ms)
|
124865 |
+
iteration 3259/ 292968 | consumed samples: 6674432 | consumed tokens: 1000325120 | elapsed time per iteration (ms): 112710.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.479379E+00 | loss scale: 131072.0 | grad norm: 40458.309 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124866 |
+
time (ms)
|
124867 |
+
iteration 3260/ 292968 | consumed samples: 6676480 | consumed tokens: 1000816640 | elapsed time per iteration (ms): 113180.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.473989E+00 | loss scale: 131072.0 | grad norm: 33616.338 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124868 |
+
time (ms)
|
124869 |
+
iteration 3261/ 292968 | consumed samples: 6678528 | consumed tokens: 1001308160 | elapsed time per iteration (ms): 111586.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.498321E+00 | loss scale: 131072.0 | grad norm: 46783.611 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124870 |
+
time (ms)
|
124871 |
+
iteration 3262/ 292968 | consumed samples: 6680576 | consumed tokens: 1001799680 | elapsed time per iteration (ms): 111004.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.508714E+00 | loss scale: 131072.0 | grad norm: 45630.732 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124872 |
+
time (ms)
|
124873 |
+
iteration 3263/ 292968 | consumed samples: 6682624 | consumed tokens: 1002291200 | elapsed time per iteration (ms): 112232.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.504247E+00 | loss scale: 131072.0 | grad norm: 56956.550 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124874 |
+
time (ms)
|
124875 |
+
iteration 3264/ 292968 | consumed samples: 6684672 | consumed tokens: 1002782720 | elapsed time per iteration (ms): 114908.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.483163E+00 | loss scale: 131072.0 | grad norm: 60085.040 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124876 |
+
time (ms)
|
124877 |
+
iteration 3265/ 292968 | consumed samples: 6686720 | consumed tokens: 1003274240 | elapsed time per iteration (ms): 113398.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.485216E+00 | loss scale: 131072.0 | grad norm: 50799.684 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124878 |
+
time (ms)
|
124879 |
+
iteration 3266/ 292968 | consumed samples: 6688768 | consumed tokens: 1003765760 | elapsed time per iteration (ms): 111087.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.498115E+00 | loss scale: 131072.0 | grad norm: 40484.037 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124880 |
+
time (ms)
|