bigscience-bot commited on
Commit
c2edfc3
1 Parent(s): 1511d6d
logs/main_log.txt CHANGED
@@ -11300,3 +11300,21 @@ time (ms)
11300
  time (ms)
11301
  iteration 391/ 159576 | consumed samples: 6256 | elapsed time per iteration (ms): 13663.4 | learning rate: 1.735E-06 | global batch size: 16 | lm loss: 7.835842E+00 | loss scale: 4096.0 | grad norm: 48700.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
11302
  time (ms)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11300
  time (ms)
11301
  iteration 391/ 159576 | consumed samples: 6256 | elapsed time per iteration (ms): 13663.4 | learning rate: 1.735E-06 | global batch size: 16 | lm loss: 7.835842E+00 | loss scale: 4096.0 | grad norm: 48700.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
11302
  time (ms)
11303
+ iteration 392/ 159576 | consumed samples: 6272 | elapsed time per iteration (ms): 13682.5 | learning rate: 1.740E-06 | global batch size: 16 | lm loss: 7.976984E+00 | loss scale: 4096.0 | grad norm: 45273.731 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
11304
+ time (ms)
11305
+ iteration 393/ 159576 | consumed samples: 6288 | elapsed time per iteration (ms): 13680.3 | learning rate: 1.744E-06 | global batch size: 16 | lm loss: 8.063533E+00 | loss scale: 4096.0 | grad norm: 62966.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
11306
+ time (ms)
11307
+ iteration 394/ 159576 | consumed samples: 6304 | elapsed time per iteration (ms): 14158.6 | learning rate: 1.749E-06 | global batch size: 16 | lm loss: 7.962408E+00 | loss scale: 4096.0 | grad norm: 38917.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
11308
+ time (ms)
11309
+ iteration 395/ 159576 | consumed samples: 6320 | elapsed time per iteration (ms): 13412.3 | learning rate: 1.753E-06 | global batch size: 16 | lm loss: 7.930057E+00 | loss scale: 4096.0 | grad norm: 59046.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
11310
+ time (ms)
11311
+ iteration 396/ 159576 | consumed samples: 6336 | elapsed time per iteration (ms): 13631.9 | learning rate: 1.757E-06 | global batch size: 16 | lm loss: 8.137497E+00 | loss scale: 4096.0 | grad norm: 51299.741 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
11312
+ time (ms)
11313
+ iteration 397/ 159576 | consumed samples: 6352 | elapsed time per iteration (ms): 13706.0 | learning rate: 1.762E-06 | global batch size: 16 | lm loss: 8.020626E+00 | loss scale: 4096.0 | grad norm: 37056.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
11314
+ time (ms)
11315
+ iteration 398/ 159576 | consumed samples: 6368 | elapsed time per iteration (ms): 14158.0 | learning rate: 1.766E-06 | global batch size: 16 | lm loss: 8.114269E+00 | loss scale: 4096.0 | grad norm: 64105.827 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
11316
+ time (ms)
11317
+ iteration 399/ 159576 | consumed samples: 6384 | elapsed time per iteration (ms): 13628.9 | learning rate: 1.771E-06 | global batch size: 16 | lm loss: 8.186448E+00 | loss scale: 4096.0 | grad norm: 55633.908 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
11318
+ time (ms)
11319
+ iteration 400/ 159576 | consumed samples: 6400 | elapsed time per iteration (ms): 13727.5 | learning rate: 1.775E-06 | global batch size: 16 | lm loss: 8.182411E+00 | loss scale: 4096.0 | grad norm: 51312.945 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
11320
+ time (ms)
tensorboard/events.out.tfevents.1632442278.r9i6n8.49464.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:49580f7abc53b5ddf53ce62a98e87a068706051970d852b4b2638bd5ca2bd6e6
3
- size 313223
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:46d2d9f03cdd1bd0fd1bfd5d6a311d921f1e635ba550caef48e54ad1d33fc5c9
3
+ size 320893