bigscience-bot commited on
Commit
5237fdb
1 Parent(s): 92b8bb4
Files changed (1) hide show
  1. logs/main_log.txt +3 -0
logs/main_log.txt CHANGED
@@ -9396,3 +9396,6 @@ saving checkpoint at iteration 36000 to /gpfsscratch/rech/six/commun/checkpoin
9396
  [2021-11-23 07:37:59,884] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_12_mp_rank_00_optim_states.pt
9397
  successfully saved checkpoint at iteration 36000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints
9398
  time (ms) | save-checkpoint: 2754.08
 
 
 
 
9396
  [2021-11-23 07:37:59,884] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_12_mp_rank_00_optim_states.pt
9397
  successfully saved checkpoint at iteration 36000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints
9398
  time (ms) | save-checkpoint: 2754.08
9399
+ iteration 36200/ 152972 | consumed samples: 13454784 | consumed tokens: 27555397632 | elapsed time per iteration (ms): 5199.9 | learning rate: 1.850E-04 | global batch size: 512 | lm loss: 1.515898E+00 | loss scale: 131072.0 | grad norm: 10063.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
9400
+ iteration 36400/ 152972 | consumed samples: 13557184 | consumed tokens: 27765112832 | elapsed time per iteration (ms): 4641.8 | learning rate: 1.848E-04 | global batch size: 512 | lm loss: 1.632050E+00 | loss scale: 131072.0 | grad norm: 12474.869 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
9401
+ iteration 36600/ 152972 | consumed samples: 13659584 | consumed tokens: 27974828032 | elapsed time per iteration (ms): 4650.3 | learning rate: 1.846E-04 | global batch size: 512 | lm loss: 1.614918E+00 | loss scale: 131072.0 | grad norm: 10143.922 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |