bigscience-bot
commited on
Commit
•
9db6399
1
Parent(s):
07e1063
new data
Browse files- logs/main_log.txt +13 -0
logs/main_log.txt
CHANGED
@@ -36573,3 +36573,16 @@ time (ms)
|
|
36573 |
time (ms)
|
36574 |
iteration 98800/ 152972 | consumed samples: 45505984 | elapsed time per iteration (ms): 6202.3 | learning rate: 7.032E-05 | global batch size: 512 | lm loss: 2.799443E+00 | loss scale: 524288.0 | grad norm: 50434.969 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
36575 |
time (ms)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
36573 |
time (ms)
|
36574 |
iteration 98800/ 152972 | consumed samples: 45505984 | elapsed time per iteration (ms): 6202.3 | learning rate: 7.032E-05 | global batch size: 512 | lm loss: 2.799443E+00 | loss scale: 524288.0 | grad norm: 50434.969 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
36575 |
time (ms)
|
36576 |
+
iteration 99000/ 152972 | consumed samples: 45608384 | elapsed time per iteration (ms): 5940.7 | learning rate: 6.993E-05 | global batch size: 512 | lm loss: 2.802682E+00 | loss scale: 524288.0 | grad norm: 50073.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
36577 |
+
time (ms)
|
36578 |
+
-------------------------------------------------------------------------------------------------
|
36579 |
+
validation loss at iteration 99000 | lm loss value: 2.748792E+00 | lm loss PPL: 1.562374E+01 |
|
36580 |
+
-------------------------------------------------------------------------------------------------
|
36581 |
+
saving checkpoint at iteration 99000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
|
36582 |
+
[2021-10-04 09:35:28,488] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step99000/mp_rank_00_model_states.pt
|
36583 |
+
successfully saved checkpoint at iteration 99000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
|
36584 |
+
time (ms) | save-checkpoint: 1504.53
|
36585 |
+
iteration 99200/ 152972 | consumed samples: 45710784 | elapsed time per iteration (ms): 6868.2 | learning rate: 6.954E-05 | global batch size: 512 | lm loss: 2.800038E+00 | loss scale: 262144.0 | grad norm: 25434.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
36586 |
+
time (ms)
|
36587 |
+
iteration 99400/ 152972 | consumed samples: 45813184 | elapsed time per iteration (ms): 5957.7 | learning rate: 6.915E-05 | global batch size: 512 | lm loss: 2.799743E+00 | loss scale: 262144.0 | grad norm: 29201.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
36588 |
+
time (ms)
|