bigscience-bot
commited on
Commit
•
a1704c8
1
Parent(s):
a3b7f6f
new data
Browse files- logs/main_log.txt +13 -0
logs/main_log.txt
CHANGED
@@ -24224,3 +24224,16 @@ time (ms)
|
|
24224 |
time (ms)
|
24225 |
iteration 65600/ 152972 | consumed samples: 28507584 | elapsed time per iteration (ms): 6080.1 | learning rate: 1.381E-04 | global batch size: 512 | lm loss: 2.868213E+00 | loss scale: 1048576.0 | grad norm: 95743.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
24226 |
time (ms)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
24224 |
time (ms)
|
24225 |
iteration 65600/ 152972 | consumed samples: 28507584 | elapsed time per iteration (ms): 6080.1 | learning rate: 1.381E-04 | global batch size: 512 | lm loss: 2.868213E+00 | loss scale: 1048576.0 | grad norm: 95743.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
24226 |
time (ms)
|
24227 |
+
iteration 65800/ 152972 | consumed samples: 28609984 | elapsed time per iteration (ms): 6071.1 | learning rate: 1.377E-04 | global batch size: 512 | lm loss: 2.865917E+00 | loss scale: 262144.0 | grad norm: 24556.130 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
24228 |
+
time (ms)
|
24229 |
+
[2021-10-01 21:29:11,271] [INFO] [logging.py:68:log_dist] [Rank 0] step=66000, skipped=139, lr=[0.00013727953456626625, 0.00013727953456626625], mom=[(0.9, 0.999), (0.9, 0.999)]
|
24230 |
+
steps: 66000 loss: 2.8560 iter time (s): 0.003 samples/sec: 171544.840
|
24231 |
+
iteration 66000/ 152972 | consumed samples: 28712384 | elapsed time per iteration (ms): 6062.4 | learning rate: 1.373E-04 | global batch size: 512 | lm loss: 2.867659E+00 | loss scale: 262144.0 | grad norm: 25894.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
24232 |
+
time (ms)
|
24233 |
+
-------------------------------------------------------------------------------------------------
|
24234 |
+
validation loss at iteration 66000 | lm loss value: 2.818860E+00 | lm loss PPL: 1.675774E+01 |
|
24235 |
+
-------------------------------------------------------------------------------------------------
|
24236 |
+
saving checkpoint at iteration 66000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
|
24237 |
+
[2021-10-01 21:32:06,271] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step66000/mp_rank_00_model_states.pt
|
24238 |
+
successfully saved checkpoint at iteration 66000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
|
24239 |
+
time (ms) | save-checkpoint: 1460.68
|