bigscience-bot
commited on
Commit
•
0df4f4d
1
Parent(s):
48d2c00
new data
Browse files- logs/main_log.txt +77 -0
logs/main_log.txt
CHANGED
@@ -23010,3 +23010,80 @@ valid loss at iteration 77000 | lm loss value: 1.463746E+00 | lm loss PPL: 4.322
|
|
23010 |
iteration 77400/ 152972 | consumed samples: 34549184 | consumed tokens: 70756728832 | elapsed time per iteration (ms): 4640.8 | learning rate: 1.141E-04 | global batch size: 512 | lm loss: 1.480313E+00 | loss scale: 32768.0 | grad norm: 3833.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
23011 |
iteration 77600/ 152972 | consumed samples: 34651584 | consumed tokens: 70966444032 | elapsed time per iteration (ms): 4642.4 | learning rate: 1.137E-04 | global batch size: 512 | lm loss: 1.533694E+00 | loss scale: 32768.0 | grad norm: 1919.989 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
23012 |
iteration 77800/ 152972 | consumed samples: 34753984 | consumed tokens: 71176159232 | elapsed time per iteration (ms): 4642.3 | learning rate: 1.133E-04 | global batch size: 512 | lm loss: 1.484447E+00 | loss scale: 32768.0 | grad norm: 3477.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23010 |
iteration 77400/ 152972 | consumed samples: 34549184 | consumed tokens: 70756728832 | elapsed time per iteration (ms): 4640.8 | learning rate: 1.141E-04 | global batch size: 512 | lm loss: 1.480313E+00 | loss scale: 32768.0 | grad norm: 3833.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
23011 |
iteration 77600/ 152972 | consumed samples: 34651584 | consumed tokens: 70966444032 | elapsed time per iteration (ms): 4642.4 | learning rate: 1.137E-04 | global batch size: 512 | lm loss: 1.533694E+00 | loss scale: 32768.0 | grad norm: 1919.989 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
23012 |
iteration 77800/ 152972 | consumed samples: 34753984 | consumed tokens: 71176159232 | elapsed time per iteration (ms): 4642.3 | learning rate: 1.133E-04 | global batch size: 512 | lm loss: 1.484447E+00 | loss scale: 32768.0 | grad norm: 3477.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
23013 |
+
[2021-11-25 19:50:54,600] [INFO] [logging.py:68:log_dist] [Rank 0] step=78000, skipped=161, lr=[0.00011287287812300848, 0.00011287287812300848], mom=[(0.9, 0.999), (0.9, 0.999)]
|
23014 |
+
steps: 78000 loss: 1.6072 iter time (s): 0.002 samples/sec: 220820.748
|
23015 |
+
iteration 78000/ 152972 | consumed samples: 34856384 | consumed tokens: 71385874432 | elapsed time per iteration (ms): 4640.2 | learning rate: 1.129E-04 | global batch size: 512 | lm loss: 1.487332E+00 | loss scale: 65536.0 | grad norm: 6999.023 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
23016 |
+
-------------------------------------------------------------------------------------------
|
23017 |
+
valid loss at iteration 78000 | lm loss value: 1.447810E+00 | lm loss PPL: 4.253788E+00 |
|
23018 |
+
-------------------------------------------------------------------------------------------
|
23019 |
+
saving checkpoint at iteration 78000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints
|
23020 |
+
[2021-11-25 19:53:02,460] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/mp_rank_00_model_states.pt
|
23021 |
+
[2021-11-25 19:53:02,891] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_17_mp_rank_01_optim_states.pt
|
23022 |
+
[2021-11-25 19:53:02,893] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_15_mp_rank_00_optim_states.pt
|
23023 |
+
[2021-11-25 19:53:02,895] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_22_mp_rank_00_optim_states.pt
|
23024 |
+
[2021-11-25 19:53:02,897] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_14_mp_rank_00_optim_states.pt
|
23025 |
+
[2021-11-25 19:53:02,897] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_28_mp_rank_01_optim_states.pt
|
23026 |
+
[2021-11-25 19:53:02,898] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_11_mp_rank_00_optim_states.pt
|
23027 |
+
[2021-11-25 19:53:02,899] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_7_mp_rank_00_optim_states.pt
|
23028 |
+
[2021-11-25 19:53:02,900] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_26_mp_rank_01_optim_states.pt
|
23029 |
+
[2021-11-25 19:53:02,901] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_26_mp_rank_00_optim_states.pt
|
23030 |
+
[2021-11-25 19:53:02,901] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_31_mp_rank_00_optim_states.pt
|
23031 |
+
[2021-11-25 19:53:02,903] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_4_mp_rank_00_optim_states.pt
|
23032 |
+
[2021-11-25 19:53:02,903] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_2_mp_rank_00_optim_states.pt
|
23033 |
+
[2021-11-25 19:53:02,905] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_11_mp_rank_01_optim_states.pt
|
23034 |
+
[2021-11-25 19:53:02,906] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_31_mp_rank_01_optim_states.pt
|
23035 |
+
[2021-11-25 19:53:02,906] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_10_mp_rank_00_optim_states.pt
|
23036 |
+
[2021-11-25 19:53:02,906] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_3_mp_rank_01_optim_states.pt
|
23037 |
+
[2021-11-25 19:53:02,906] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_25_mp_rank_01_optim_states.pt
|
23038 |
+
[2021-11-25 19:53:02,907] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_21_mp_rank_01_optim_states.pt
|
23039 |
+
[2021-11-25 19:53:02,908] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_12_mp_rank_01_optim_states.pt
|
23040 |
+
[2021-11-25 19:53:02,908] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_1_mp_rank_00_optim_states.pt
|
23041 |
+
[2021-11-25 19:53:02,909] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_18_mp_rank_01_optim_states.pt
|
23042 |
+
[2021-11-25 19:53:02,912] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_20_mp_rank_01_optim_states.pt
|
23043 |
+
[2021-11-25 19:53:02,912] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_15_mp_rank_01_optim_states.pt
|
23044 |
+
[2021-11-25 19:53:02,913] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_7_mp_rank_01_optim_states.pt
|
23045 |
+
[2021-11-25 19:53:02,913] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_6_mp_rank_01_optim_states.pt
|
23046 |
+
[2021-11-25 19:53:02,914] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_16_mp_rank_00_optim_states.pt
|
23047 |
+
[2021-11-25 19:53:02,917] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_18_mp_rank_00_optim_states.pt
|
23048 |
+
[2021-11-25 19:53:02,920] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_24_mp_rank_00_optim_states.pt
|
23049 |
+
[2021-11-25 19:53:02,923] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_28_mp_rank_00_optim_states.pt
|
23050 |
+
[2021-11-25 19:53:02,926] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_13_mp_rank_00_optim_states.pt
|
23051 |
+
[2021-11-25 19:53:02,928] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_29_mp_rank_00_optim_states.pt
|
23052 |
+
[2021-11-25 19:53:02,928] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_4_mp_rank_01_optim_states.pt
|
23053 |
+
[2021-11-25 19:53:02,929] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_17_mp_rank_00_optim_states.pt
|
23054 |
+
[2021-11-25 19:53:02,930] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_9_mp_rank_00_optim_states.pt
|
23055 |
+
[2021-11-25 19:53:02,931] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_27_mp_rank_01_optim_states.pt
|
23056 |
+
[2021-11-25 19:53:02,931] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_8_mp_rank_00_optim_states.pt
|
23057 |
+
[2021-11-25 19:53:02,932] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_0_mp_rank_01_optim_states.pt
|
23058 |
+
[2021-11-25 19:53:02,934] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_30_mp_rank_01_optim_states.pt
|
23059 |
+
[2021-11-25 19:53:02,937] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_23_mp_rank_01_optim_states.pt
|
23060 |
+
[2021-11-25 19:53:02,937] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_23_mp_rank_00_optim_states.pt
|
23061 |
+
[2021-11-25 19:53:02,937] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_2_mp_rank_01_optim_states.pt
|
23062 |
+
[2021-11-25 19:53:02,938] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_6_mp_rank_00_optim_states.pt
|
23063 |
+
[2021-11-25 19:53:02,939] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_10_mp_rank_01_optim_states.pt
|
23064 |
+
[2021-11-25 19:53:02,939] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_20_mp_rank_00_optim_states.pt
|
23065 |
+
[2021-11-25 19:53:02,940] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_9_mp_rank_01_optim_states.pt
|
23066 |
+
[2021-11-25 19:53:02,940] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_25_mp_rank_00_optim_states.pt
|
23067 |
+
[2021-11-25 19:53:02,941] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_24_mp_rank_01_optim_states.pt
|
23068 |
+
[2021-11-25 19:53:02,941] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_12_mp_rank_00_optim_states.pt
|
23069 |
+
[2021-11-25 19:53:02,941] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_27_mp_rank_00_optim_states.pt
|
23070 |
+
[2021-11-25 19:53:02,943] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_0_mp_rank_00_optim_states.pt
|
23071 |
+
[2021-11-25 19:53:02,944] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_29_mp_rank_01_optim_states.pt
|
23072 |
+
[2021-11-25 19:53:02,944] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_21_mp_rank_00_optim_states.pt
|
23073 |
+
[2021-11-25 19:53:02,944] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_30_mp_rank_00_optim_states.pt
|
23074 |
+
[2021-11-25 19:53:02,945] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_5_mp_rank_00_optim_states.pt
|
23075 |
+
[2021-11-25 19:53:02,947] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_3_mp_rank_00_optim_states.pt
|
23076 |
+
[2021-11-25 19:53:02,949] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_22_mp_rank_01_optim_states.pt
|
23077 |
+
[2021-11-25 19:53:02,951] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_19_mp_rank_00_optim_states.pt
|
23078 |
+
[2021-11-25 19:53:02,955] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_1_mp_rank_01_optim_states.pt
|
23079 |
+
[2021-11-25 19:53:02,959] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_16_mp_rank_01_optim_states.pt
|
23080 |
+
[2021-11-25 19:53:02,961] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_14_mp_rank_01_optim_states.pt
|
23081 |
+
[2021-11-25 19:53:02,962] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_5_mp_rank_01_optim_states.pt
|
23082 |
+
[2021-11-25 19:53:02,964] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_19_mp_rank_01_optim_states.pt
|
23083 |
+
[2021-11-25 19:53:02,966] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_8_mp_rank_01_optim_states.pt
|
23084 |
+
[2021-11-25 19:53:02,969] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_13_mp_rank_01_optim_states.pt
|
23085 |
+
successfully saved checkpoint at iteration 78000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints
|
23086 |
+
time (ms) | save-checkpoint: 2747.00
|
23087 |
+
iteration 78200/ 152972 | consumed samples: 34958784 | consumed tokens: 71595589632 | elapsed time per iteration (ms): 5294.6 | learning rate: 1.125E-04 | global batch size: 512 | lm loss: 1.510645E+00 | loss scale: 65536.0 | grad norm: 5610.810 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
23088 |
+
iteration 78400/ 152972 | consumed samples: 35061184 | consumed tokens: 71805304832 | elapsed time per iteration (ms): 4655.1 | learning rate: 1.120E-04 | global batch size: 512 | lm loss: 1.483753E+00 | loss scale: 131072.0 | grad norm: 16549.933 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
23089 |
+
iteration 78600/ 152972 | consumed samples: 35163584 | consumed tokens: 72015020032 | elapsed time per iteration (ms): 4642.6 | learning rate: 1.116E-04 | global batch size: 512 | lm loss: 1.459196E+00 | loss scale: 131072.0 | grad norm: 13634.901 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|