bigscience-bot
commited on
Commit
•
4a188ab
1
Parent(s):
b9c1b4b
new data
Browse files- logs/main_log.txt +76 -0
logs/main_log.txt
CHANGED
@@ -31966,3 +31966,79 @@ valid loss at iteration 107000 | lm loss value: 1.344946E+00 | lm loss PPL: 3.83
|
|
31966 |
--------------------------------------------------------------------------------------------
|
31967 |
iteration 107200/ 152972 | consumed samples: 49806784 | consumed tokens: 102004293632 | elapsed time per iteration (ms): 5177.5 | learning rate: 5.465E-05 | global batch size: 512 | lm loss: 1.421839E+00 | loss scale: 131072.0 | grad norm: 11207.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
31968 |
iteration 107400/ 152972 | consumed samples: 49909184 | consumed tokens: 102214008832 | elapsed time per iteration (ms): 4629.7 | learning rate: 5.430E-05 | global batch size: 512 | lm loss: 1.469068E+00 | loss scale: 262144.0 | grad norm: 27396.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31966 |
--------------------------------------------------------------------------------------------
|
31967 |
iteration 107200/ 152972 | consumed samples: 49806784 | consumed tokens: 102004293632 | elapsed time per iteration (ms): 5177.5 | learning rate: 5.465E-05 | global batch size: 512 | lm loss: 1.421839E+00 | loss scale: 131072.0 | grad norm: 11207.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
31968 |
iteration 107400/ 152972 | consumed samples: 49909184 | consumed tokens: 102214008832 | elapsed time per iteration (ms): 4629.7 | learning rate: 5.430E-05 | global batch size: 512 | lm loss: 1.469068E+00 | loss scale: 262144.0 | grad norm: 27396.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
31969 |
+
iteration 107600/ 152972 | consumed samples: 50011584 | consumed tokens: 102423724032 | elapsed time per iteration (ms): 4642.6 | learning rate: 5.395E-05 | global batch size: 512 | lm loss: 1.501998E+00 | loss scale: 65536.0 | grad norm: 9313.869 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
31970 |
+
iteration 107800/ 152972 | consumed samples: 50113984 | consumed tokens: 102633439232 | elapsed time per iteration (ms): 4625.8 | learning rate: 5.360E-05 | global batch size: 512 | lm loss: 1.399211E+00 | loss scale: 65536.0 | grad norm: 6361.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
31971 |
+
[2021-11-27 11:39:18,380] [INFO] [logging.py:68:log_dist] [Rank 0] step=108000, skipped=226, lr=[5.324864073497269e-05, 5.324864073497269e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
|
31972 |
+
steps: 108000 loss: 1.4070 iter time (s): 0.002 samples/sec: 221110.385
|
31973 |
+
iteration 108000/ 152972 | consumed samples: 50216384 | consumed tokens: 102843154432 | elapsed time per iteration (ms): 4639.0 | learning rate: 5.325E-05 | global batch size: 512 | lm loss: 1.433393E+00 | loss scale: 65536.0 | grad norm: 7201.073 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
31974 |
+
--------------------------------------------------------------------------------------------
|
31975 |
+
valid loss at iteration 108000 | lm loss value: 1.516618E+00 | lm loss PPL: 4.556788E+00 |
|
31976 |
+
--------------------------------------------------------------------------------------------
|
31977 |
+
saving checkpoint at iteration 108000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints
|
31978 |
+
[2021-11-27 11:41:08,344] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/mp_rank_00_model_states.pt
|
31979 |
+
[2021-11-27 11:41:08,791] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_13_mp_rank_01_optim_states.pt
|
31980 |
+
[2021-11-27 11:41:08,794] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_30_mp_rank_00_optim_states.pt
|
31981 |
+
[2021-11-27 11:41:08,795] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_4_mp_rank_01_optim_states.pt
|
31982 |
+
[2021-11-27 11:41:08,795] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_22_mp_rank_01_optim_states.pt
|
31983 |
+
[2021-11-27 11:41:08,795] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_3_mp_rank_00_optim_states.pt
|
31984 |
+
[2021-11-27 11:41:08,798] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_22_mp_rank_00_optim_states.pt
|
31985 |
+
[2021-11-27 11:41:08,798] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_2_mp_rank_01_optim_states.pt
|
31986 |
+
[2021-11-27 11:41:08,798] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_25_mp_rank_01_optim_states.pt
|
31987 |
+
[2021-11-27 11:41:08,799] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_19_mp_rank_00_optim_states.pt
|
31988 |
+
[2021-11-27 11:41:08,799] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_6_mp_rank_01_optim_states.pt
|
31989 |
+
[2021-11-27 11:41:08,802] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_28_mp_rank_01_optim_states.pt
|
31990 |
+
[2021-11-27 11:41:08,802] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_9_mp_rank_00_optim_states.pt
|
31991 |
+
[2021-11-27 11:41:08,804] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_14_mp_rank_00_optim_states.pt
|
31992 |
+
[2021-11-27 11:41:08,804] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_5_mp_rank_00_optim_states.pt
|
31993 |
+
[2021-11-27 11:41:08,804] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_10_mp_rank_00_optim_states.pt
|
31994 |
+
[2021-11-27 11:41:08,804] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_4_mp_rank_00_optim_states.pt
|
31995 |
+
[2021-11-27 11:41:08,804] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_29_mp_rank_01_optim_states.pt
|
31996 |
+
[2021-11-27 11:41:08,805] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_0_mp_rank_00_optim_states.pt
|
31997 |
+
[2021-11-27 11:41:08,805] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_11_mp_rank_01_optim_states.pt
|
31998 |
+
[2021-11-27 11:41:08,806] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_26_mp_rank_00_optim_states.pt
|
31999 |
+
[2021-11-27 11:41:08,806] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_12_mp_rank_00_optim_states.pt
|
32000 |
+
[2021-11-27 11:41:08,807] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_16_mp_rank_01_optim_states.pt
|
32001 |
+
[2021-11-27 11:41:08,808] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_20_mp_rank_00_optim_states.pt
|
32002 |
+
[2021-11-27 11:41:08,811] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_18_mp_rank_00_optim_states.pt
|
32003 |
+
[2021-11-27 11:41:08,811] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_18_mp_rank_01_optim_states.pt
|
32004 |
+
[2021-11-27 11:41:08,811] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_26_mp_rank_01_optim_states.pt
|
32005 |
+
[2021-11-27 11:41:08,813] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_0_mp_rank_01_optim_states.pt
|
32006 |
+
[2021-11-27 11:41:08,820] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_15_mp_rank_01_optim_states.pt
|
32007 |
+
[2021-11-27 11:41:08,821] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_20_mp_rank_01_optim_states.pt
|
32008 |
+
[2021-11-27 11:41:08,825] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_14_mp_rank_01_optim_states.pt
|
32009 |
+
[2021-11-27 11:41:08,826] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_31_mp_rank_01_optim_states.pt
|
32010 |
+
[2021-11-27 11:41:08,826] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_1_mp_rank_00_optim_states.pt
|
32011 |
+
[2021-11-27 11:41:08,827] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_9_mp_rank_01_optim_states.pt
|
32012 |
+
[2021-11-27 11:41:08,828] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_30_mp_rank_01_optim_states.pt
|
32013 |
+
[2021-11-27 11:41:08,830] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_2_mp_rank_00_optim_states.pt
|
32014 |
+
[2021-11-27 11:41:08,830] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_8_mp_rank_01_optim_states.pt
|
32015 |
+
[2021-11-27 11:41:08,830] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_27_mp_rank_01_optim_states.pt
|
32016 |
+
[2021-11-27 11:41:08,832] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_21_mp_rank_00_optim_states.pt
|
32017 |
+
[2021-11-27 11:41:08,832] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_8_mp_rank_00_optim_states.pt
|
32018 |
+
[2021-11-27 11:41:08,833] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_11_mp_rank_00_optim_states.pt
|
32019 |
+
[2021-11-27 11:41:08,833] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_27_mp_rank_00_optim_states.pt
|
32020 |
+
[2021-11-27 11:41:08,833] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_24_mp_rank_00_optim_states.pt
|
32021 |
+
[2021-11-27 11:41:08,834] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_17_mp_rank_01_optim_states.pt
|
32022 |
+
[2021-11-27 11:41:08,834] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_7_mp_rank_01_optim_states.pt
|
32023 |
+
[2021-11-27 11:41:08,835] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_19_mp_rank_01_optim_states.pt
|
32024 |
+
[2021-11-27 11:41:08,835] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_16_mp_rank_00_optim_states.pt
|
32025 |
+
[2021-11-27 11:41:08,835] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_29_mp_rank_00_optim_states.pt
|
32026 |
+
[2021-11-27 11:41:08,836] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_3_mp_rank_01_optim_states.pt
|
32027 |
+
[2021-11-27 11:41:08,838] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_23_mp_rank_00_optim_states.pt
|
32028 |
+
[2021-11-27 11:41:08,839] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_28_mp_rank_00_optim_states.pt
|
32029 |
+
[2021-11-27 11:41:08,840] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_6_mp_rank_00_optim_states.pt
|
32030 |
+
[2021-11-27 11:41:08,843] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_7_mp_rank_00_optim_states.pt
|
32031 |
+
[2021-11-27 11:41:08,844] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_17_mp_rank_00_optim_states.pt
|
32032 |
+
[2021-11-27 11:41:08,845] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_15_mp_rank_00_optim_states.pt
|
32033 |
+
[2021-11-27 11:41:08,846] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_5_mp_rank_01_optim_states.pt
|
32034 |
+
[2021-11-27 11:41:08,849] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_1_mp_rank_01_optim_states.pt
|
32035 |
+
[2021-11-27 11:41:08,851] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_13_mp_rank_00_optim_states.pt
|
32036 |
+
[2021-11-27 11:41:08,851] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_24_mp_rank_01_optim_states.pt
|
32037 |
+
[2021-11-27 11:41:08,853] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_23_mp_rank_01_optim_states.pt
|
32038 |
+
[2021-11-27 11:41:08,854] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_10_mp_rank_01_optim_states.pt
|
32039 |
+
[2021-11-27 11:41:08,854] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_21_mp_rank_01_optim_states.pt
|
32040 |
+
[2021-11-27 11:41:08,859] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_12_mp_rank_01_optim_states.pt
|
32041 |
+
[2021-11-27 11:41:08,861] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_25_mp_rank_00_optim_states.pt
|
32042 |
+
[2021-11-27 11:41:08,863] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_31_mp_rank_00_optim_states.pt
|
32043 |
+
successfully saved checkpoint at iteration 108000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints
|
32044 |
+
time (ms) | save-checkpoint: 2617.82
|