bigscience-bot commited on
Commit
4a188ab
1 Parent(s): b9c1b4b
Files changed (1) hide show
  1. logs/main_log.txt +76 -0
logs/main_log.txt CHANGED
@@ -31966,3 +31966,79 @@ valid loss at iteration 107000 | lm loss value: 1.344946E+00 | lm loss PPL: 3.83
31966
  --------------------------------------------------------------------------------------------
31967
  iteration 107200/ 152972 | consumed samples: 49806784 | consumed tokens: 102004293632 | elapsed time per iteration (ms): 5177.5 | learning rate: 5.465E-05 | global batch size: 512 | lm loss: 1.421839E+00 | loss scale: 131072.0 | grad norm: 11207.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
31968
  iteration 107400/ 152972 | consumed samples: 49909184 | consumed tokens: 102214008832 | elapsed time per iteration (ms): 4629.7 | learning rate: 5.430E-05 | global batch size: 512 | lm loss: 1.469068E+00 | loss scale: 262144.0 | grad norm: 27396.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31966
  --------------------------------------------------------------------------------------------
31967
  iteration 107200/ 152972 | consumed samples: 49806784 | consumed tokens: 102004293632 | elapsed time per iteration (ms): 5177.5 | learning rate: 5.465E-05 | global batch size: 512 | lm loss: 1.421839E+00 | loss scale: 131072.0 | grad norm: 11207.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
31968
  iteration 107400/ 152972 | consumed samples: 49909184 | consumed tokens: 102214008832 | elapsed time per iteration (ms): 4629.7 | learning rate: 5.430E-05 | global batch size: 512 | lm loss: 1.469068E+00 | loss scale: 262144.0 | grad norm: 27396.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
31969
+ iteration 107600/ 152972 | consumed samples: 50011584 | consumed tokens: 102423724032 | elapsed time per iteration (ms): 4642.6 | learning rate: 5.395E-05 | global batch size: 512 | lm loss: 1.501998E+00 | loss scale: 65536.0 | grad norm: 9313.869 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
31970
+ iteration 107800/ 152972 | consumed samples: 50113984 | consumed tokens: 102633439232 | elapsed time per iteration (ms): 4625.8 | learning rate: 5.360E-05 | global batch size: 512 | lm loss: 1.399211E+00 | loss scale: 65536.0 | grad norm: 6361.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
31971
+ [2021-11-27 11:39:18,380] [INFO] [logging.py:68:log_dist] [Rank 0] step=108000, skipped=226, lr=[5.324864073497269e-05, 5.324864073497269e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
31972
+ steps: 108000 loss: 1.4070 iter time (s): 0.002 samples/sec: 221110.385
31973
+ iteration 108000/ 152972 | consumed samples: 50216384 | consumed tokens: 102843154432 | elapsed time per iteration (ms): 4639.0 | learning rate: 5.325E-05 | global batch size: 512 | lm loss: 1.433393E+00 | loss scale: 65536.0 | grad norm: 7201.073 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
31974
+ --------------------------------------------------------------------------------------------
31975
+ valid loss at iteration 108000 | lm loss value: 1.516618E+00 | lm loss PPL: 4.556788E+00 |
31976
+ --------------------------------------------------------------------------------------------
31977
+ saving checkpoint at iteration 108000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints
31978
+ [2021-11-27 11:41:08,344] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/mp_rank_00_model_states.pt
31979
+ [2021-11-27 11:41:08,791] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_13_mp_rank_01_optim_states.pt
31980
+ [2021-11-27 11:41:08,794] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_30_mp_rank_00_optim_states.pt
31981
+ [2021-11-27 11:41:08,795] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_4_mp_rank_01_optim_states.pt
31982
+ [2021-11-27 11:41:08,795] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_22_mp_rank_01_optim_states.pt
31983
+ [2021-11-27 11:41:08,795] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_3_mp_rank_00_optim_states.pt
31984
+ [2021-11-27 11:41:08,798] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_22_mp_rank_00_optim_states.pt
31985
+ [2021-11-27 11:41:08,798] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_2_mp_rank_01_optim_states.pt
31986
+ [2021-11-27 11:41:08,798] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_25_mp_rank_01_optim_states.pt
31987
+ [2021-11-27 11:41:08,799] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_19_mp_rank_00_optim_states.pt
31988
+ [2021-11-27 11:41:08,799] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_6_mp_rank_01_optim_states.pt
31989
+ [2021-11-27 11:41:08,802] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_28_mp_rank_01_optim_states.pt
31990
+ [2021-11-27 11:41:08,802] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_9_mp_rank_00_optim_states.pt
31991
+ [2021-11-27 11:41:08,804] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_14_mp_rank_00_optim_states.pt
31992
+ [2021-11-27 11:41:08,804] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_5_mp_rank_00_optim_states.pt
31993
+ [2021-11-27 11:41:08,804] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_10_mp_rank_00_optim_states.pt
31994
+ [2021-11-27 11:41:08,804] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_4_mp_rank_00_optim_states.pt
31995
+ [2021-11-27 11:41:08,804] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_29_mp_rank_01_optim_states.pt
31996
+ [2021-11-27 11:41:08,805] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_0_mp_rank_00_optim_states.pt
31997
+ [2021-11-27 11:41:08,805] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_11_mp_rank_01_optim_states.pt
31998
+ [2021-11-27 11:41:08,806] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_26_mp_rank_00_optim_states.pt
31999
+ [2021-11-27 11:41:08,806] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_12_mp_rank_00_optim_states.pt
32000
+ [2021-11-27 11:41:08,807] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_16_mp_rank_01_optim_states.pt
32001
+ [2021-11-27 11:41:08,808] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_20_mp_rank_00_optim_states.pt
32002
+ [2021-11-27 11:41:08,811] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_18_mp_rank_00_optim_states.pt
32003
+ [2021-11-27 11:41:08,811] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_18_mp_rank_01_optim_states.pt
32004
+ [2021-11-27 11:41:08,811] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_26_mp_rank_01_optim_states.pt
32005
+ [2021-11-27 11:41:08,813] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_0_mp_rank_01_optim_states.pt
32006
+ [2021-11-27 11:41:08,820] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_15_mp_rank_01_optim_states.pt
32007
+ [2021-11-27 11:41:08,821] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_20_mp_rank_01_optim_states.pt
32008
+ [2021-11-27 11:41:08,825] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_14_mp_rank_01_optim_states.pt
32009
+ [2021-11-27 11:41:08,826] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_31_mp_rank_01_optim_states.pt
32010
+ [2021-11-27 11:41:08,826] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_1_mp_rank_00_optim_states.pt
32011
+ [2021-11-27 11:41:08,827] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_9_mp_rank_01_optim_states.pt
32012
+ [2021-11-27 11:41:08,828] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_30_mp_rank_01_optim_states.pt
32013
+ [2021-11-27 11:41:08,830] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_2_mp_rank_00_optim_states.pt
32014
+ [2021-11-27 11:41:08,830] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_8_mp_rank_01_optim_states.pt
32015
+ [2021-11-27 11:41:08,830] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_27_mp_rank_01_optim_states.pt
32016
+ [2021-11-27 11:41:08,832] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_21_mp_rank_00_optim_states.pt
32017
+ [2021-11-27 11:41:08,832] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_8_mp_rank_00_optim_states.pt
32018
+ [2021-11-27 11:41:08,833] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_11_mp_rank_00_optim_states.pt
32019
+ [2021-11-27 11:41:08,833] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_27_mp_rank_00_optim_states.pt
32020
+ [2021-11-27 11:41:08,833] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_24_mp_rank_00_optim_states.pt
32021
+ [2021-11-27 11:41:08,834] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_17_mp_rank_01_optim_states.pt
32022
+ [2021-11-27 11:41:08,834] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_7_mp_rank_01_optim_states.pt
32023
+ [2021-11-27 11:41:08,835] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_19_mp_rank_01_optim_states.pt
32024
+ [2021-11-27 11:41:08,835] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_16_mp_rank_00_optim_states.pt
32025
+ [2021-11-27 11:41:08,835] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_29_mp_rank_00_optim_states.pt
32026
+ [2021-11-27 11:41:08,836] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_3_mp_rank_01_optim_states.pt
32027
+ [2021-11-27 11:41:08,838] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_23_mp_rank_00_optim_states.pt
32028
+ [2021-11-27 11:41:08,839] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_28_mp_rank_00_optim_states.pt
32029
+ [2021-11-27 11:41:08,840] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_6_mp_rank_00_optim_states.pt
32030
+ [2021-11-27 11:41:08,843] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_7_mp_rank_00_optim_states.pt
32031
+ [2021-11-27 11:41:08,844] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_17_mp_rank_00_optim_states.pt
32032
+ [2021-11-27 11:41:08,845] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_15_mp_rank_00_optim_states.pt
32033
+ [2021-11-27 11:41:08,846] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_5_mp_rank_01_optim_states.pt
32034
+ [2021-11-27 11:41:08,849] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_1_mp_rank_01_optim_states.pt
32035
+ [2021-11-27 11:41:08,851] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_13_mp_rank_00_optim_states.pt
32036
+ [2021-11-27 11:41:08,851] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_24_mp_rank_01_optim_states.pt
32037
+ [2021-11-27 11:41:08,853] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_23_mp_rank_01_optim_states.pt
32038
+ [2021-11-27 11:41:08,854] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_10_mp_rank_01_optim_states.pt
32039
+ [2021-11-27 11:41:08,854] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_21_mp_rank_01_optim_states.pt
32040
+ [2021-11-27 11:41:08,859] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_12_mp_rank_01_optim_states.pt
32041
+ [2021-11-27 11:41:08,861] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_25_mp_rank_00_optim_states.pt
32042
+ [2021-11-27 11:41:08,863] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_31_mp_rank_00_optim_states.pt
32043
+ successfully saved checkpoint at iteration 108000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints
32044
+ time (ms) | save-checkpoint: 2617.82