bigscience-bot commited on
Commit
ec5eed0
1 Parent(s): 952f9f7
Files changed (1) hide show
  1. logs/main_log.txt +5 -0
logs/main_log.txt CHANGED
@@ -13598,3 +13598,8 @@ saving checkpoint at iteration 39000 to /gpfsscratch/rech/six/commun/checkpoin
13598
  time (ms) | save-checkpoint: 1132.57
13599
  iteration 39200/ 152972 | consumed samples: 14990784 | consumed tokens: 30701125632 | elapsed time per iteration (ms): 7188.0 | learning rate: 1.815E-04 | global batch size: 512 | lm loss: 2.043959E+00 | loss scale: 2097152.0 | grad norm: 163340.761 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
13600
  iteration 39400/ 152972 | consumed samples: 15093184 | consumed tokens: 30910840832 | elapsed time per iteration (ms): 6078.2 | learning rate: 1.812E-04 | global batch size: 512 | lm loss: 2.050927E+00 | loss scale: 524288.0 | grad norm: 39812.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
 
 
 
 
 
 
13598
  time (ms) | save-checkpoint: 1132.57
13599
  iteration 39200/ 152972 | consumed samples: 14990784 | consumed tokens: 30701125632 | elapsed time per iteration (ms): 7188.0 | learning rate: 1.815E-04 | global batch size: 512 | lm loss: 2.043959E+00 | loss scale: 2097152.0 | grad norm: 163340.761 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
13600
  iteration 39400/ 152972 | consumed samples: 15093184 | consumed tokens: 30910840832 | elapsed time per iteration (ms): 6078.2 | learning rate: 1.812E-04 | global batch size: 512 | lm loss: 2.050927E+00 | loss scale: 524288.0 | grad norm: 39812.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
13601
+ iteration 39600/ 152972 | consumed samples: 15195584 | consumed tokens: 31120556032 | elapsed time per iteration (ms): 6080.7 | learning rate: 1.810E-04 | global batch size: 512 | lm loss: 2.059096E+00 | loss scale: 524288.0 | grad norm: 36877.013 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
13602
+ iteration 39800/ 152972 | consumed samples: 15297984 | consumed tokens: 31330271232 | elapsed time per iteration (ms): 6096.6 | learning rate: 1.807E-04 | global batch size: 512 | lm loss: 2.034562E+00 | loss scale: 524288.0 | grad norm: 40947.509 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
13603
+ [2021-11-04 21:50:08,652] [INFO] [logging.py:68:log_dist] [Rank 0] step=40000, skipped=81, lr=[0.0001804599959837998, 0.0001804599959837998], mom=[(0.9, 0.999), (0.9, 0.999)]
13604
+ iteration 40000/ 152972 | consumed samples: 15400384 | consumed tokens: 31539986432 | elapsed time per iteration (ms): 6080.1 | learning rate: 1.805E-04 | global batch size: 512 | lm loss: 2.046368E+00 | loss scale: 1048576.0 | grad norm: 79641.172 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
13605
+ steps: 40000 loss: 1.9234 iter time (s): 0.003 samples/sec: 168896.024