bigscience-bot
commited on
Commit
•
d909835
1
Parent(s):
92b5f86
new data
Browse files- logs/main_log.txt +3 -0
logs/main_log.txt
CHANGED
@@ -27742,3 +27742,6 @@ saving checkpoint at iteration 93000 to /gpfsscratch/rech/six/commun/checkpoin
|
|
27742 |
[2021-11-26 15:51:47,209] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_1_mp_rank_01_optim_states.pt
|
27743 |
successfully saved checkpoint at iteration 93000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints
|
27744 |
time (ms) | save-checkpoint: 2768.87
|
|
|
|
|
|
|
|
27742 |
[2021-11-26 15:51:47,209] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_1_mp_rank_01_optim_states.pt
|
27743 |
successfully saved checkpoint at iteration 93000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints
|
27744 |
time (ms) | save-checkpoint: 2768.87
|
27745 |
+
iteration 93200/ 152972 | consumed samples: 42638784 | consumed tokens: 87324229632 | elapsed time per iteration (ms): 5214.7 | learning rate: 8.141E-05 | global batch size: 512 | lm loss: 1.456170E+00 | loss scale: 65536.0 | grad norm: 6013.702 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
27746 |
+
iteration 93400/ 152972 | consumed samples: 42741184 | consumed tokens: 87533944832 | elapsed time per iteration (ms): 4645.9 | learning rate: 8.100E-05 | global batch size: 512 | lm loss: 1.457245E+00 | loss scale: 65536.0 | grad norm: 5967.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
27747 |
+
iteration 93600/ 152972 | consumed samples: 42843584 | consumed tokens: 87743660032 | elapsed time per iteration (ms): 4643.0 | learning rate: 8.060E-05 | global batch size: 512 | lm loss: 1.416812E+00 | loss scale: 131072.0 | grad norm: 12605.921 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|