bigscience-bot commited on
Commit
5b57cca
1 Parent(s): 8e5ab3a
Files changed (1) hide show
  1. logs/main_log.txt +6 -0
logs/main_log.txt CHANGED
@@ -36567,3 +36567,9 @@ steps: 98000 loss: 2.8400 iter time (s): 0.003 samples/sec: 173625.320
36567
  -------------------------------------------------------------------------------------------------
36568
  iteration 98200/ 152972 | consumed samples: 45198784 | elapsed time per iteration (ms): 7696.6 | learning rate: 7.149E-05 | global batch size: 512 | lm loss: 2.799447E+00 | loss scale: 131072.0 | grad norm: 12783.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
36569
  time (ms)
 
 
 
 
 
 
 
36567
  -------------------------------------------------------------------------------------------------
36568
  iteration 98200/ 152972 | consumed samples: 45198784 | elapsed time per iteration (ms): 7696.6 | learning rate: 7.149E-05 | global batch size: 512 | lm loss: 2.799447E+00 | loss scale: 131072.0 | grad norm: 12783.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
36569
  time (ms)
36570
+ iteration 98400/ 152972 | consumed samples: 45301184 | elapsed time per iteration (ms): 5960.1 | learning rate: 7.110E-05 | global batch size: 512 | lm loss: 2.800086E+00 | loss scale: 262144.0 | grad norm: 27676.668 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
36571
+ time (ms)
36572
+ iteration 98600/ 152972 | consumed samples: 45403584 | elapsed time per iteration (ms): 5939.2 | learning rate: 7.071E-05 | global batch size: 512 | lm loss: 2.802239E+00 | loss scale: 262144.0 | grad norm: 26204.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
36573
+ time (ms)
36574
+ iteration 98800/ 152972 | consumed samples: 45505984 | elapsed time per iteration (ms): 6202.3 | learning rate: 7.032E-05 | global batch size: 512 | lm loss: 2.799443E+00 | loss scale: 524288.0 | grad norm: 50434.969 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
36575
+ time (ms)