bigscience-bot
commited on
Commit
•
eeea764
1
Parent(s):
28d27fa
new data
Browse files- logs/main_log.txt +11 -0
logs/main_log.txt
CHANGED
@@ -28205,3 +28205,14 @@ time (ms)
|
|
28205 |
time (ms)
|
28206 |
iteration 74600/ 152972 | consumed samples: 33115584 | elapsed time per iteration (ms): 6776.9 | learning rate: 1.199E-04 | global batch size: 512 | lm loss: 2.846215E+00 | loss scale: 524288.0 | grad norm: 53580.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
28207 |
time (ms)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28205 |
time (ms)
|
28206 |
iteration 74600/ 152972 | consumed samples: 33115584 | elapsed time per iteration (ms): 6776.9 | learning rate: 1.199E-04 | global batch size: 512 | lm loss: 2.846215E+00 | loss scale: 524288.0 | grad norm: 53580.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
28207 |
time (ms)
|
28208 |
+
iteration 74800/ 152972 | consumed samples: 33217984 | elapsed time per iteration (ms): 6796.4 | learning rate: 1.195E-04 | global batch size: 512 | lm loss: 2.845712E+00 | loss scale: 524288.0 | grad norm: 52668.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
28209 |
+
time (ms)
|
28210 |
+
iteration 75000/ 152972 | consumed samples: 33320384 | elapsed time per iteration (ms): 6791.5 | learning rate: 1.191E-04 | global batch size: 512 | lm loss: 2.846152E+00 | loss scale: 1048576.0 | grad norm: 110561.030 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
28211 |
+
time (ms)
|
28212 |
+
-------------------------------------------------------------------------------------------------
|
28213 |
+
validation loss at iteration 75000 | lm loss value: 2.795223E+00 | lm loss PPL: 1.636627E+01 |
|
28214 |
+
-------------------------------------------------------------------------------------------------
|
28215 |
+
saving checkpoint at iteration 75000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
|
28216 |
+
[2021-10-02 15:42:38,338] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step75000/mp_rank_00_model_states.pt
|
28217 |
+
successfully saved checkpoint at iteration 75000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
|
28218 |
+
time (ms) | save-checkpoint: 1703.03
|