bigscience-bot commited on
Commit
0671faa
·
1 Parent(s): 3f4dd56
Files changed (1) hide show
  1. logs/main_log.txt +94 -0
logs/main_log.txt CHANGED
@@ -67087,3 +67087,97 @@ time (ms)
67087
  time (ms)
67088
  iteration 672/ 292968 | consumed samples: 1376256 | consumed tokens: 108363776 | elapsed time per iteration (ms): 75677.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67089
  time (ms)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67087
  time (ms)
67088
  iteration 672/ 292968 | consumed samples: 1376256 | consumed tokens: 108363776 | elapsed time per iteration (ms): 75677.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67089
  time (ms)
67090
+ iteration 673/ 292968 | consumed samples: 1378304 | consumed tokens: 108560384 | elapsed time per iteration (ms): 76553.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67091
+ time (ms)
67092
+ iteration 674/ 292968 | consumed samples: 1380352 | consumed tokens: 108756992 | elapsed time per iteration (ms): 76026.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67093
+ time (ms)
67094
+ iteration 675/ 292968 | consumed samples: 1382400 | consumed tokens: 108953600 | elapsed time per iteration (ms): 75590.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67095
+ time (ms)
67096
+ iteration 676/ 292968 | consumed samples: 1384448 | consumed tokens: 109150208 | elapsed time per iteration (ms): 75609.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67097
+ time (ms)
67098
+ iteration 677/ 292968 | consumed samples: 1386496 | consumed tokens: 109346816 | elapsed time per iteration (ms): 75151.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67099
+ time (ms)
67100
+ iteration 678/ 292968 | consumed samples: 1388544 | consumed tokens: 109543424 | elapsed time per iteration (ms): 75600.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67101
+ time (ms)
67102
+ iteration 679/ 292968 | consumed samples: 1390592 | consumed tokens: 109740032 | elapsed time per iteration (ms): 76321.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67103
+ time (ms)
67104
+ iteration 680/ 292968 | consumed samples: 1392640 | consumed tokens: 109936640 | elapsed time per iteration (ms): 76596.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67105
+ time (ms)
67106
+ iteration 681/ 292968 | consumed samples: 1394688 | consumed tokens: 110133248 | elapsed time per iteration (ms): 74699.1 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67107
+ time (ms)
67108
+ iteration 682/ 292968 | consumed samples: 1396736 | consumed tokens: 110329856 | elapsed time per iteration (ms): 76971.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67109
+ time (ms)
67110
+ iteration 683/ 292968 | consumed samples: 1398784 | consumed tokens: 110526464 | elapsed time per iteration (ms): 75437.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67111
+ time (ms)
67112
+ iteration 684/ 292968 | consumed samples: 1400832 | consumed tokens: 110723072 | elapsed time per iteration (ms): 77129.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67113
+ time (ms)
67114
+ iteration 685/ 292968 | consumed samples: 1402880 | consumed tokens: 110919680 | elapsed time per iteration (ms): 76671.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67115
+ time (ms)
67116
+ iteration 686/ 292968 | consumed samples: 1404928 | consumed tokens: 111116288 | elapsed time per iteration (ms): 76006.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67117
+ time (ms)
67118
+ iteration 687/ 292968 | consumed samples: 1406976 | consumed tokens: 111312896 | elapsed time per iteration (ms): 76657.1 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67119
+ time (ms)
67120
+ iteration 688/ 292968 | consumed samples: 1409024 | consumed tokens: 111509504 | elapsed time per iteration (ms): 75831.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67121
+ time (ms)
67122
+ iteration 689/ 292968 | consumed samples: 1411072 | consumed tokens: 111706112 | elapsed time per iteration (ms): 76089.1 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67123
+ time (ms)
67124
+ iteration 690/ 292968 | consumed samples: 1413120 | consumed tokens: 111902720 | elapsed time per iteration (ms): 76356.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67125
+ time (ms)
67126
+ iteration 691/ 292968 | consumed samples: 1415168 | consumed tokens: 112099328 | elapsed time per iteration (ms): 77592.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67127
+ time (ms)
67128
+ iteration 692/ 292968 | consumed samples: 1417216 | consumed tokens: 112295936 | elapsed time per iteration (ms): 79668.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67129
+ time (ms)
67130
+ iteration 693/ 292968 | consumed samples: 1419264 | consumed tokens: 112492544 | elapsed time per iteration (ms): 76034.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67131
+ time (ms)
67132
+ iteration 694/ 292968 | consumed samples: 1421312 | consumed tokens: 112689152 | elapsed time per iteration (ms): 75553.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67133
+ time (ms)
67134
+ iteration 695/ 292968 | consumed samples: 1423360 | consumed tokens: 112885760 | elapsed time per iteration (ms): 76585.5 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67135
+ time (ms)
67136
+ iteration 696/ 292968 | consumed samples: 1425408 | consumed tokens: 113082368 | elapsed time per iteration (ms): 77768.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67137
+ time (ms)
67138
+ iteration 697/ 292968 | consumed samples: 1427456 | consumed tokens: 113278976 | elapsed time per iteration (ms): 78986.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67139
+ time (ms)
67140
+ iteration 698/ 292968 | consumed samples: 1429504 | consumed tokens: 113475584 | elapsed time per iteration (ms): 75299.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67141
+ time (ms)
67142
+ iteration 699/ 292968 | consumed samples: 1431552 | consumed tokens: 113672192 | elapsed time per iteration (ms): 76113.5 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67143
+ time (ms)
67144
+ iteration 700/ 292968 | consumed samples: 1433600 | consumed tokens: 113868800 | elapsed time per iteration (ms): 75831.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67145
+ time (ms)
67146
+ iteration 701/ 292968 | consumed samples: 1435648 | consumed tokens: 114065408 | elapsed time per iteration (ms): 77954.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67147
+ time (ms)
67148
+ iteration 702/ 292968 | consumed samples: 1437696 | consumed tokens: 114262016 | elapsed time per iteration (ms): 76860.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67149
+ time (ms)
67150
+ iteration 703/ 292968 | consumed samples: 1439744 | consumed tokens: 114458624 | elapsed time per iteration (ms): 77549.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67151
+ time (ms)
67152
+ iteration 704/ 292968 | consumed samples: 1441792 | consumed tokens: 114655232 | elapsed time per iteration (ms): 76086.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67153
+ time (ms)
67154
+ iteration 705/ 292968 | consumed samples: 1443840 | consumed tokens: 114851840 | elapsed time per iteration (ms): 75728.5 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67155
+ time (ms)
67156
+ iteration 706/ 292968 | consumed samples: 1445888 | consumed tokens: 115048448 | elapsed time per iteration (ms): 77004.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67157
+ time (ms)
67158
+ iteration 707/ 292968 | consumed samples: 1447936 | consumed tokens: 115245056 | elapsed time per iteration (ms): 75610.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67159
+ time (ms)
67160
+ iteration 708/ 292968 | consumed samples: 1449984 | consumed tokens: 115441664 | elapsed time per iteration (ms): 76005.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67161
+ time (ms)
67162
+ iteration 709/ 292968 | consumed samples: 1452032 | consumed tokens: 115638272 | elapsed time per iteration (ms): 74977.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67163
+ time (ms)
67164
+ iteration 710/ 292968 | consumed samples: 1454080 | consumed tokens: 115834880 | elapsed time per iteration (ms): 77453.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67165
+ time (ms)
67166
+ iteration 711/ 292968 | consumed samples: 1456128 | consumed tokens: 116031488 | elapsed time per iteration (ms): 74366.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67167
+ time (ms)
67168
+ iteration 712/ 292968 | consumed samples: 1458176 | consumed tokens: 116228096 | elapsed time per iteration (ms): 74400.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67169
+ time (ms)
67170
+ iteration 713/ 292968 | consumed samples: 1460224 | consumed tokens: 116424704 | elapsed time per iteration (ms): 75045.5 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67171
+ time (ms)
67172
+ iteration 714/ 292968 | consumed samples: 1462272 | consumed tokens: 116621312 | elapsed time per iteration (ms): 75912.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67173
+ time (ms)
67174
+ iteration 715/ 292968 | consumed samples: 1464320 | consumed tokens: 116817920 | elapsed time per iteration (ms): 75331.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67175
+ time (ms)
67176
+ iteration 716/ 292968 | consumed samples: 1466368 | consumed tokens: 117014528 | elapsed time per iteration (ms): 74867.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67177
+ time (ms)
67178
+ iteration 717/ 292968 | consumed samples: 1468416 | consumed tokens: 117211136 | elapsed time per iteration (ms): 76188.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67179
+ time (ms)
67180
+ iteration 718/ 292968 | consumed samples: 1470464 | consumed tokens: 117407744 | elapsed time per iteration (ms): 75181.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67181
+ time (ms)
67182
+ iteration 719/ 292968 | consumed samples: 1472512 | consumed tokens: 117604352 | elapsed time per iteration (ms): 75603.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67183
+ time (ms)