bigscience-bot
commited on
Commit
·
0671faa
1
Parent(s):
3f4dd56
new data
Browse files- logs/main_log.txt +94 -0
logs/main_log.txt
CHANGED
@@ -67087,3 +67087,97 @@ time (ms)
|
|
67087 |
time (ms)
|
67088 |
iteration 672/ 292968 | consumed samples: 1376256 | consumed tokens: 108363776 | elapsed time per iteration (ms): 75677.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67089 |
time (ms)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
67087 |
time (ms)
|
67088 |
iteration 672/ 292968 | consumed samples: 1376256 | consumed tokens: 108363776 | elapsed time per iteration (ms): 75677.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67089 |
time (ms)
|
67090 |
+
iteration 673/ 292968 | consumed samples: 1378304 | consumed tokens: 108560384 | elapsed time per iteration (ms): 76553.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67091 |
+
time (ms)
|
67092 |
+
iteration 674/ 292968 | consumed samples: 1380352 | consumed tokens: 108756992 | elapsed time per iteration (ms): 76026.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67093 |
+
time (ms)
|
67094 |
+
iteration 675/ 292968 | consumed samples: 1382400 | consumed tokens: 108953600 | elapsed time per iteration (ms): 75590.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67095 |
+
time (ms)
|
67096 |
+
iteration 676/ 292968 | consumed samples: 1384448 | consumed tokens: 109150208 | elapsed time per iteration (ms): 75609.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67097 |
+
time (ms)
|
67098 |
+
iteration 677/ 292968 | consumed samples: 1386496 | consumed tokens: 109346816 | elapsed time per iteration (ms): 75151.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67099 |
+
time (ms)
|
67100 |
+
iteration 678/ 292968 | consumed samples: 1388544 | consumed tokens: 109543424 | elapsed time per iteration (ms): 75600.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67101 |
+
time (ms)
|
67102 |
+
iteration 679/ 292968 | consumed samples: 1390592 | consumed tokens: 109740032 | elapsed time per iteration (ms): 76321.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67103 |
+
time (ms)
|
67104 |
+
iteration 680/ 292968 | consumed samples: 1392640 | consumed tokens: 109936640 | elapsed time per iteration (ms): 76596.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67105 |
+
time (ms)
|
67106 |
+
iteration 681/ 292968 | consumed samples: 1394688 | consumed tokens: 110133248 | elapsed time per iteration (ms): 74699.1 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67107 |
+
time (ms)
|
67108 |
+
iteration 682/ 292968 | consumed samples: 1396736 | consumed tokens: 110329856 | elapsed time per iteration (ms): 76971.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67109 |
+
time (ms)
|
67110 |
+
iteration 683/ 292968 | consumed samples: 1398784 | consumed tokens: 110526464 | elapsed time per iteration (ms): 75437.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67111 |
+
time (ms)
|
67112 |
+
iteration 684/ 292968 | consumed samples: 1400832 | consumed tokens: 110723072 | elapsed time per iteration (ms): 77129.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67113 |
+
time (ms)
|
67114 |
+
iteration 685/ 292968 | consumed samples: 1402880 | consumed tokens: 110919680 | elapsed time per iteration (ms): 76671.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67115 |
+
time (ms)
|
67116 |
+
iteration 686/ 292968 | consumed samples: 1404928 | consumed tokens: 111116288 | elapsed time per iteration (ms): 76006.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67117 |
+
time (ms)
|
67118 |
+
iteration 687/ 292968 | consumed samples: 1406976 | consumed tokens: 111312896 | elapsed time per iteration (ms): 76657.1 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67119 |
+
time (ms)
|
67120 |
+
iteration 688/ 292968 | consumed samples: 1409024 | consumed tokens: 111509504 | elapsed time per iteration (ms): 75831.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67121 |
+
time (ms)
|
67122 |
+
iteration 689/ 292968 | consumed samples: 1411072 | consumed tokens: 111706112 | elapsed time per iteration (ms): 76089.1 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67123 |
+
time (ms)
|
67124 |
+
iteration 690/ 292968 | consumed samples: 1413120 | consumed tokens: 111902720 | elapsed time per iteration (ms): 76356.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67125 |
+
time (ms)
|
67126 |
+
iteration 691/ 292968 | consumed samples: 1415168 | consumed tokens: 112099328 | elapsed time per iteration (ms): 77592.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67127 |
+
time (ms)
|
67128 |
+
iteration 692/ 292968 | consumed samples: 1417216 | consumed tokens: 112295936 | elapsed time per iteration (ms): 79668.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67129 |
+
time (ms)
|
67130 |
+
iteration 693/ 292968 | consumed samples: 1419264 | consumed tokens: 112492544 | elapsed time per iteration (ms): 76034.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67131 |
+
time (ms)
|
67132 |
+
iteration 694/ 292968 | consumed samples: 1421312 | consumed tokens: 112689152 | elapsed time per iteration (ms): 75553.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67133 |
+
time (ms)
|
67134 |
+
iteration 695/ 292968 | consumed samples: 1423360 | consumed tokens: 112885760 | elapsed time per iteration (ms): 76585.5 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67135 |
+
time (ms)
|
67136 |
+
iteration 696/ 292968 | consumed samples: 1425408 | consumed tokens: 113082368 | elapsed time per iteration (ms): 77768.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67137 |
+
time (ms)
|
67138 |
+
iteration 697/ 292968 | consumed samples: 1427456 | consumed tokens: 113278976 | elapsed time per iteration (ms): 78986.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67139 |
+
time (ms)
|
67140 |
+
iteration 698/ 292968 | consumed samples: 1429504 | consumed tokens: 113475584 | elapsed time per iteration (ms): 75299.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67141 |
+
time (ms)
|
67142 |
+
iteration 699/ 292968 | consumed samples: 1431552 | consumed tokens: 113672192 | elapsed time per iteration (ms): 76113.5 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67143 |
+
time (ms)
|
67144 |
+
iteration 700/ 292968 | consumed samples: 1433600 | consumed tokens: 113868800 | elapsed time per iteration (ms): 75831.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67145 |
+
time (ms)
|
67146 |
+
iteration 701/ 292968 | consumed samples: 1435648 | consumed tokens: 114065408 | elapsed time per iteration (ms): 77954.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67147 |
+
time (ms)
|
67148 |
+
iteration 702/ 292968 | consumed samples: 1437696 | consumed tokens: 114262016 | elapsed time per iteration (ms): 76860.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67149 |
+
time (ms)
|
67150 |
+
iteration 703/ 292968 | consumed samples: 1439744 | consumed tokens: 114458624 | elapsed time per iteration (ms): 77549.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67151 |
+
time (ms)
|
67152 |
+
iteration 704/ 292968 | consumed samples: 1441792 | consumed tokens: 114655232 | elapsed time per iteration (ms): 76086.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67153 |
+
time (ms)
|
67154 |
+
iteration 705/ 292968 | consumed samples: 1443840 | consumed tokens: 114851840 | elapsed time per iteration (ms): 75728.5 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67155 |
+
time (ms)
|
67156 |
+
iteration 706/ 292968 | consumed samples: 1445888 | consumed tokens: 115048448 | elapsed time per iteration (ms): 77004.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67157 |
+
time (ms)
|
67158 |
+
iteration 707/ 292968 | consumed samples: 1447936 | consumed tokens: 115245056 | elapsed time per iteration (ms): 75610.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67159 |
+
time (ms)
|
67160 |
+
iteration 708/ 292968 | consumed samples: 1449984 | consumed tokens: 115441664 | elapsed time per iteration (ms): 76005.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67161 |
+
time (ms)
|
67162 |
+
iteration 709/ 292968 | consumed samples: 1452032 | consumed tokens: 115638272 | elapsed time per iteration (ms): 74977.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67163 |
+
time (ms)
|
67164 |
+
iteration 710/ 292968 | consumed samples: 1454080 | consumed tokens: 115834880 | elapsed time per iteration (ms): 77453.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67165 |
+
time (ms)
|
67166 |
+
iteration 711/ 292968 | consumed samples: 1456128 | consumed tokens: 116031488 | elapsed time per iteration (ms): 74366.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67167 |
+
time (ms)
|
67168 |
+
iteration 712/ 292968 | consumed samples: 1458176 | consumed tokens: 116228096 | elapsed time per iteration (ms): 74400.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67169 |
+
time (ms)
|
67170 |
+
iteration 713/ 292968 | consumed samples: 1460224 | consumed tokens: 116424704 | elapsed time per iteration (ms): 75045.5 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67171 |
+
time (ms)
|
67172 |
+
iteration 714/ 292968 | consumed samples: 1462272 | consumed tokens: 116621312 | elapsed time per iteration (ms): 75912.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67173 |
+
time (ms)
|
67174 |
+
iteration 715/ 292968 | consumed samples: 1464320 | consumed tokens: 116817920 | elapsed time per iteration (ms): 75331.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67175 |
+
time (ms)
|
67176 |
+
iteration 716/ 292968 | consumed samples: 1466368 | consumed tokens: 117014528 | elapsed time per iteration (ms): 74867.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67177 |
+
time (ms)
|
67178 |
+
iteration 717/ 292968 | consumed samples: 1468416 | consumed tokens: 117211136 | elapsed time per iteration (ms): 76188.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67179 |
+
time (ms)
|
67180 |
+
iteration 718/ 292968 | consumed samples: 1470464 | consumed tokens: 117407744 | elapsed time per iteration (ms): 75181.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67181 |
+
time (ms)
|
67182 |
+
iteration 719/ 292968 | consumed samples: 1472512 | consumed tokens: 117604352 | elapsed time per iteration (ms): 75603.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67183 |
+
time (ms)
|