bigscience-bot commited on
Commit
d0d5b37
1 Parent(s): 308da5b
Files changed (1) hide show
  1. logs/main_log.txt +75 -0
logs/main_log.txt CHANGED
@@ -45505,3 +45505,78 @@ valid loss at iteration 152000 | lm loss value: 1.404997E+00 | lm loss PPL: 4.07
45505
  iteration 152200/ 152972 | consumed samples: 72846784 | consumed tokens: 149190213632 | elapsed time per iteration (ms): 5220.0 | learning rate: 1.003E-05 | global batch size: 512 | lm loss: 1.398634E+00 | loss scale: 131072.0 | grad norm: 15487.143 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
45506
  iteration 152400/ 152972 | consumed samples: 72949184 | consumed tokens: 149399928832 | elapsed time per iteration (ms): 4700.0 | learning rate: 1.002E-05 | global batch size: 512 | lm loss: 1.426604E+00 | loss scale: 262144.0 | grad norm: 39846.653 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
45507
  iteration 152600/ 152972 | consumed samples: 73051584 | consumed tokens: 149609644032 | elapsed time per iteration (ms): 4664.5 | learning rate: 1.001E-05 | global batch size: 512 | lm loss: 1.383711E+00 | loss scale: 262144.0 | grad norm: 23119.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45505
  iteration 152200/ 152972 | consumed samples: 72846784 | consumed tokens: 149190213632 | elapsed time per iteration (ms): 5220.0 | learning rate: 1.003E-05 | global batch size: 512 | lm loss: 1.398634E+00 | loss scale: 131072.0 | grad norm: 15487.143 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
45506
  iteration 152400/ 152972 | consumed samples: 72949184 | consumed tokens: 149399928832 | elapsed time per iteration (ms): 4700.0 | learning rate: 1.002E-05 | global batch size: 512 | lm loss: 1.426604E+00 | loss scale: 262144.0 | grad norm: 39846.653 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
45507
  iteration 152600/ 152972 | consumed samples: 73051584 | consumed tokens: 149609644032 | elapsed time per iteration (ms): 4664.5 | learning rate: 1.001E-05 | global batch size: 512 | lm loss: 1.383711E+00 | loss scale: 262144.0 | grad norm: 23119.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
45508
+ iteration 152800/ 152972 | consumed samples: 73153984 | consumed tokens: 149819359232 | elapsed time per iteration (ms): 4668.8 | learning rate: 1.001E-05 | global batch size: 512 | lm loss: 1.378195E+00 | loss scale: 131072.0 | grad norm: 15325.895 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
45509
+ [after training is done] datetime: 2021-11-29 23:30:49
45510
+ saving checkpoint at iteration 152972 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints
45511
+ ------------------------------------------------------------------------------------------------------------
45512
+ valid loss at the end of training for val data | lm loss value: 1.404385E+00 | lm loss PPL: 4.073020E+00 |
45513
+ ------------------------------------------------------------------------------------------------------------
45514
+ [2021-11-29 23:32:39,212] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/mp_rank_00_model_states.pt
45515
+ [2021-11-29 23:32:39,632] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_1_mp_rank_01_optim_states.pt
45516
+ [2021-11-29 23:32:39,633] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_1_mp_rank_00_optim_states.pt
45517
+ [2021-11-29 23:32:39,634] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_31_mp_rank_00_optim_states.pt
45518
+ [2021-11-29 23:32:39,639] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_15_mp_rank_00_optim_states.pt
45519
+ [2021-11-29 23:32:39,641] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_29_mp_rank_01_optim_states.pt
45520
+ [2021-11-29 23:32:39,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_21_mp_rank_01_optim_states.pt
45521
+ [2021-11-29 23:32:39,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_27_mp_rank_00_optim_states.pt
45522
+ [2021-11-29 23:32:39,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_21_mp_rank_00_optim_states.pt
45523
+ [2021-11-29 23:32:39,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_27_mp_rank_01_optim_states.pt
45524
+ [2021-11-29 23:32:39,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_24_mp_rank_00_optim_states.pt
45525
+ [2021-11-29 23:32:39,645] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_13_mp_rank_01_optim_states.pt
45526
+ [2021-11-29 23:32:39,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_26_mp_rank_01_optim_states.pt
45527
+ [2021-11-29 23:32:39,647] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_30_mp_rank_00_optim_states.pt
45528
+ [2021-11-29 23:32:39,647] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_18_mp_rank_00_optim_states.pt
45529
+ [2021-11-29 23:32:39,651] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_2_mp_rank_01_optim_states.pt
45530
+ [2021-11-29 23:32:39,651] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_3_mp_rank_00_optim_states.pt
45531
+ [2021-11-29 23:32:39,651] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_5_mp_rank_01_optim_states.pt
45532
+ [2021-11-29 23:32:39,652] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_6_mp_rank_01_optim_states.pt
45533
+ [2021-11-29 23:32:39,653] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_9_mp_rank_00_optim_states.pt
45534
+ [2021-11-29 23:32:39,655] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_8_mp_rank_00_optim_states.pt
45535
+ [2021-11-29 23:32:39,656] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_17_mp_rank_00_optim_states.pt
45536
+ [2021-11-29 23:32:39,657] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_18_mp_rank_01_optim_states.pt
45537
+ [2021-11-29 23:32:39,658] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_28_mp_rank_01_optim_states.pt
45538
+ [2021-11-29 23:32:39,659] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_15_mp_rank_01_optim_states.pt
45539
+ [2021-11-29 23:32:39,660] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_20_mp_rank_01_optim_states.pt
45540
+ [2021-11-29 23:32:39,661] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_13_mp_rank_00_optim_states.pt
45541
+ [2021-11-29 23:32:39,666] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_2_mp_rank_00_optim_states.pt
45542
+ [2021-11-29 23:32:39,666] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_28_mp_rank_00_optim_states.pt
45543
+ [2021-11-29 23:32:39,666] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_14_mp_rank_01_optim_states.pt
45544
+ [2021-11-29 23:32:39,666] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_22_mp_rank_01_optim_states.pt
45545
+ [2021-11-29 23:32:39,668] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_22_mp_rank_00_optim_states.pt
45546
+ [2021-11-29 23:32:39,668] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_19_mp_rank_01_optim_states.pt
45547
+ [2021-11-29 23:32:39,669] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_11_mp_rank_01_optim_states.pt
45548
+ [2021-11-29 23:32:39,669] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_6_mp_rank_00_optim_states.pt
45549
+ [2021-11-29 23:32:39,669] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_16_mp_rank_01_optim_states.pt
45550
+ [2021-11-29 23:32:39,672] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_7_mp_rank_00_optim_states.pt
45551
+ [2021-11-29 23:32:39,673] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_25_mp_rank_01_optim_states.pt
45552
+ [2021-11-29 23:32:39,673] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_23_mp_rank_01_optim_states.pt
45553
+ [2021-11-29 23:32:39,679] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_25_mp_rank_00_optim_states.pt
45554
+ [2021-11-29 23:32:39,682] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_29_mp_rank_00_optim_states.pt
45555
+ [2021-11-29 23:32:39,682] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_10_mp_rank_01_optim_states.pt
45556
+ [2021-11-29 23:32:39,682] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_23_mp_rank_00_optim_states.pt
45557
+ [2021-11-29 23:32:39,683] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_30_mp_rank_01_optim_states.pt
45558
+ [2021-11-29 23:32:39,683] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_24_mp_rank_01_optim_states.pt
45559
+ [2021-11-29 23:32:39,685] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_10_mp_rank_00_optim_states.pt
45560
+ [2021-11-29 23:32:39,686] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_7_mp_rank_01_optim_states.pt
45561
+ [2021-11-29 23:32:39,687] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_4_mp_rank_01_optim_states.pt
45562
+ [2021-11-29 23:32:39,688] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_26_mp_rank_00_optim_states.pt
45563
+ [2021-11-29 23:32:39,689] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_19_mp_rank_00_optim_states.pt
45564
+ [2021-11-29 23:32:39,692] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_3_mp_rank_01_optim_states.pt
45565
+ [2021-11-29 23:32:39,692] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_31_mp_rank_01_optim_states.pt
45566
+ [2021-11-29 23:32:39,693] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_12_mp_rank_01_optim_states.pt
45567
+ [2021-11-29 23:32:39,693] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_5_mp_rank_00_optim_states.pt
45568
+ [2021-11-29 23:32:39,695] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_20_mp_rank_00_optim_states.pt
45569
+ [2021-11-29 23:32:39,695] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_16_mp_rank_00_optim_states.pt
45570
+ [2021-11-29 23:32:39,696] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_11_mp_rank_00_optim_states.pt
45571
+ [2021-11-29 23:32:39,696] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_17_mp_rank_01_optim_states.pt
45572
+ [2021-11-29 23:32:39,698] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_14_mp_rank_00_optim_states.pt
45573
+ [2021-11-29 23:32:39,701] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_0_mp_rank_01_optim_states.pt
45574
+ [2021-11-29 23:32:39,703] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_4_mp_rank_00_optim_states.pt
45575
+ [2021-11-29 23:32:39,704] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_12_mp_rank_00_optim_states.pt
45576
+ [2021-11-29 23:32:39,710] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_9_mp_rank_01_optim_states.pt
45577
+ [2021-11-29 23:32:39,714] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_8_mp_rank_01_optim_states.pt
45578
+ [2021-11-29 23:32:39,725] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_0_mp_rank_00_optim_states.pt
45579
+ successfully saved checkpoint at iteration 152972 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints
45580
+ ------------------------------------------------------------------------------------------------------------
45581
+ test loss at the end of training for test data | lm loss value: 1.391494E+00 | lm loss PPL: 4.020852E+00 |
45582
+ ------------------------------------------------------------------------------------------------------------