bigscience-bot
commited on
Commit
•
408d03d
1
Parent(s):
0d55954
new data
Browse files- logs/main_log.txt +11 -0
logs/main_log.txt
CHANGED
@@ -8090,3 +8090,14 @@ time (ms) | save-checkpoint: 1441.76
|
|
8090 |
time (ms)
|
8091 |
iteration 19800/ 152972 | consumed samples: 5057984 | elapsed time per iteration (ms): 5950.6 | learning rate: 1.979E-04 | global batch size: 512 | lm loss: 3.121297E+00 | loss scale: 524288.0 | grad norm: 53555.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
8092 |
time (ms)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8090 |
time (ms)
|
8091 |
iteration 19800/ 152972 | consumed samples: 5057984 | elapsed time per iteration (ms): 5950.6 | learning rate: 1.979E-04 | global batch size: 512 | lm loss: 3.121297E+00 | loss scale: 524288.0 | grad norm: 53555.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
8092 |
time (ms)
|
8093 |
+
[2021-09-28 13:42:04,166] [INFO] [logging.py:68:log_dist] [Rank 0] step=20000, skipped=38, lr=[0.0001978414577067249, 0.0001978414577067249], mom=[(0.9, 0.999), (0.9, 0.999)]
|
8094 |
+
steps: 20000 loss: 3.0782 iter time (s): 0.003 samples/sec: 172555.361
|
8095 |
+
iteration 20000/ 152972 | consumed samples: 5160384 | elapsed time per iteration (ms): 5980.3 | learning rate: 1.978E-04 | global batch size: 512 | lm loss: 3.115301E+00 | loss scale: 524288.0 | grad norm: 56000.720 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
8096 |
+
time (ms)
|
8097 |
+
-------------------------------------------------------------------------------------------------
|
8098 |
+
validation loss at iteration 20000 | lm loss value: 3.064670E+00 | lm loss PPL: 2.142740E+01 |
|
8099 |
+
-------------------------------------------------------------------------------------------------
|
8100 |
+
iteration 20200/ 152972 | consumed samples: 5262784 | elapsed time per iteration (ms): 6830.4 | learning rate: 1.978E-04 | global batch size: 512 | lm loss: 3.113258E+00 | loss scale: 1048576.0 | grad norm: 103464.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
8101 |
+
time (ms)
|
8102 |
+
iteration 20400/ 152972 | consumed samples: 5365184 | elapsed time per iteration (ms): 5953.6 | learning rate: 1.977E-04 | global batch size: 512 | lm loss: 3.105831E+00 | loss scale: 1048576.0 | grad norm: 108251.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
8103 |
+
time (ms)
|