bigscience-bot commited on
Commit
a1704c8
1 Parent(s): a3b7f6f
Files changed (1) hide show
  1. logs/main_log.txt +13 -0
logs/main_log.txt CHANGED
@@ -24224,3 +24224,16 @@ time (ms)
24224
  time (ms)
24225
  iteration 65600/ 152972 | consumed samples: 28507584 | elapsed time per iteration (ms): 6080.1 | learning rate: 1.381E-04 | global batch size: 512 | lm loss: 2.868213E+00 | loss scale: 1048576.0 | grad norm: 95743.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
24226
  time (ms)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24224
  time (ms)
24225
  iteration 65600/ 152972 | consumed samples: 28507584 | elapsed time per iteration (ms): 6080.1 | learning rate: 1.381E-04 | global batch size: 512 | lm loss: 2.868213E+00 | loss scale: 1048576.0 | grad norm: 95743.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
24226
  time (ms)
24227
+ iteration 65800/ 152972 | consumed samples: 28609984 | elapsed time per iteration (ms): 6071.1 | learning rate: 1.377E-04 | global batch size: 512 | lm loss: 2.865917E+00 | loss scale: 262144.0 | grad norm: 24556.130 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
24228
+ time (ms)
24229
+ [2021-10-01 21:29:11,271] [INFO] [logging.py:68:log_dist] [Rank 0] step=66000, skipped=139, lr=[0.00013727953456626625, 0.00013727953456626625], mom=[(0.9, 0.999), (0.9, 0.999)]
24230
+ steps: 66000 loss: 2.8560 iter time (s): 0.003 samples/sec: 171544.840
24231
+ iteration 66000/ 152972 | consumed samples: 28712384 | elapsed time per iteration (ms): 6062.4 | learning rate: 1.373E-04 | global batch size: 512 | lm loss: 2.867659E+00 | loss scale: 262144.0 | grad norm: 25894.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
24232
+ time (ms)
24233
+ -------------------------------------------------------------------------------------------------
24234
+ validation loss at iteration 66000 | lm loss value: 2.818860E+00 | lm loss PPL: 1.675774E+01 |
24235
+ -------------------------------------------------------------------------------------------------
24236
+ saving checkpoint at iteration 66000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
24237
+ [2021-10-01 21:32:06,271] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step66000/mp_rank_00_model_states.pt
24238
+ successfully saved checkpoint at iteration 66000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
24239
+ time (ms) | save-checkpoint: 1460.68