bigscience-bot commited on
Commit
408d03d
1 Parent(s): 0d55954
Files changed (1) hide show
  1. logs/main_log.txt +11 -0
logs/main_log.txt CHANGED
@@ -8090,3 +8090,14 @@ time (ms) | save-checkpoint: 1441.76
8090
  time (ms)
8091
  iteration 19800/ 152972 | consumed samples: 5057984 | elapsed time per iteration (ms): 5950.6 | learning rate: 1.979E-04 | global batch size: 512 | lm loss: 3.121297E+00 | loss scale: 524288.0 | grad norm: 53555.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
8092
  time (ms)
 
 
 
 
 
 
 
 
 
 
 
 
8090
  time (ms)
8091
  iteration 19800/ 152972 | consumed samples: 5057984 | elapsed time per iteration (ms): 5950.6 | learning rate: 1.979E-04 | global batch size: 512 | lm loss: 3.121297E+00 | loss scale: 524288.0 | grad norm: 53555.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
8092
  time (ms)
8093
+ [2021-09-28 13:42:04,166] [INFO] [logging.py:68:log_dist] [Rank 0] step=20000, skipped=38, lr=[0.0001978414577067249, 0.0001978414577067249], mom=[(0.9, 0.999), (0.9, 0.999)]
8094
+ steps: 20000 loss: 3.0782 iter time (s): 0.003 samples/sec: 172555.361
8095
+ iteration 20000/ 152972 | consumed samples: 5160384 | elapsed time per iteration (ms): 5980.3 | learning rate: 1.978E-04 | global batch size: 512 | lm loss: 3.115301E+00 | loss scale: 524288.0 | grad norm: 56000.720 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
8096
+ time (ms)
8097
+ -------------------------------------------------------------------------------------------------
8098
+ validation loss at iteration 20000 | lm loss value: 3.064670E+00 | lm loss PPL: 2.142740E+01 |
8099
+ -------------------------------------------------------------------------------------------------
8100
+ iteration 20200/ 152972 | consumed samples: 5262784 | elapsed time per iteration (ms): 6830.4 | learning rate: 1.978E-04 | global batch size: 512 | lm loss: 3.113258E+00 | loss scale: 1048576.0 | grad norm: 103464.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
8101
+ time (ms)
8102
+ iteration 20400/ 152972 | consumed samples: 5365184 | elapsed time per iteration (ms): 5953.6 | learning rate: 1.977E-04 | global batch size: 512 | lm loss: 3.105831E+00 | loss scale: 1048576.0 | grad norm: 108251.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
8103
+ time (ms)