Commit
•
d0d5b37
1
Parent(s):
308da5b
new data
Browse files- logs/main_log.txt +75 -0
logs/main_log.txt
CHANGED
@@ -45505,3 +45505,78 @@ valid loss at iteration 152000 | lm loss value: 1.404997E+00 | lm loss PPL: 4.07
|
|
45505 |
iteration 152200/ 152972 | consumed samples: 72846784 | consumed tokens: 149190213632 | elapsed time per iteration (ms): 5220.0 | learning rate: 1.003E-05 | global batch size: 512 | lm loss: 1.398634E+00 | loss scale: 131072.0 | grad norm: 15487.143 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
45506 |
iteration 152400/ 152972 | consumed samples: 72949184 | consumed tokens: 149399928832 | elapsed time per iteration (ms): 4700.0 | learning rate: 1.002E-05 | global batch size: 512 | lm loss: 1.426604E+00 | loss scale: 262144.0 | grad norm: 39846.653 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
45507 |
iteration 152600/ 152972 | consumed samples: 73051584 | consumed tokens: 149609644032 | elapsed time per iteration (ms): 4664.5 | learning rate: 1.001E-05 | global batch size: 512 | lm loss: 1.383711E+00 | loss scale: 262144.0 | grad norm: 23119.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
45505 |
iteration 152200/ 152972 | consumed samples: 72846784 | consumed tokens: 149190213632 | elapsed time per iteration (ms): 5220.0 | learning rate: 1.003E-05 | global batch size: 512 | lm loss: 1.398634E+00 | loss scale: 131072.0 | grad norm: 15487.143 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
45506 |
iteration 152400/ 152972 | consumed samples: 72949184 | consumed tokens: 149399928832 | elapsed time per iteration (ms): 4700.0 | learning rate: 1.002E-05 | global batch size: 512 | lm loss: 1.426604E+00 | loss scale: 262144.0 | grad norm: 39846.653 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
45507 |
iteration 152600/ 152972 | consumed samples: 73051584 | consumed tokens: 149609644032 | elapsed time per iteration (ms): 4664.5 | learning rate: 1.001E-05 | global batch size: 512 | lm loss: 1.383711E+00 | loss scale: 262144.0 | grad norm: 23119.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
45508 |
+
iteration 152800/ 152972 | consumed samples: 73153984 | consumed tokens: 149819359232 | elapsed time per iteration (ms): 4668.8 | learning rate: 1.001E-05 | global batch size: 512 | lm loss: 1.378195E+00 | loss scale: 131072.0 | grad norm: 15325.895 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
45509 |
+
[after training is done] datetime: 2021-11-29 23:30:49
|
45510 |
+
saving checkpoint at iteration 152972 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints
|
45511 |
+
------------------------------------------------------------------------------------------------------------
|
45512 |
+
valid loss at the end of training for val data | lm loss value: 1.404385E+00 | lm loss PPL: 4.073020E+00 |
|
45513 |
+
------------------------------------------------------------------------------------------------------------
|
45514 |
+
[2021-11-29 23:32:39,212] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/mp_rank_00_model_states.pt
|
45515 |
+
[2021-11-29 23:32:39,632] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_1_mp_rank_01_optim_states.pt
|
45516 |
+
[2021-11-29 23:32:39,633] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_1_mp_rank_00_optim_states.pt
|
45517 |
+
[2021-11-29 23:32:39,634] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_31_mp_rank_00_optim_states.pt
|
45518 |
+
[2021-11-29 23:32:39,639] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_15_mp_rank_00_optim_states.pt
|
45519 |
+
[2021-11-29 23:32:39,641] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_29_mp_rank_01_optim_states.pt
|
45520 |
+
[2021-11-29 23:32:39,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_21_mp_rank_01_optim_states.pt
|
45521 |
+
[2021-11-29 23:32:39,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_27_mp_rank_00_optim_states.pt
|
45522 |
+
[2021-11-29 23:32:39,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_21_mp_rank_00_optim_states.pt
|
45523 |
+
[2021-11-29 23:32:39,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_27_mp_rank_01_optim_states.pt
|
45524 |
+
[2021-11-29 23:32:39,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_24_mp_rank_00_optim_states.pt
|
45525 |
+
[2021-11-29 23:32:39,645] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_13_mp_rank_01_optim_states.pt
|
45526 |
+
[2021-11-29 23:32:39,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_26_mp_rank_01_optim_states.pt
|
45527 |
+
[2021-11-29 23:32:39,647] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_30_mp_rank_00_optim_states.pt
|
45528 |
+
[2021-11-29 23:32:39,647] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_18_mp_rank_00_optim_states.pt
|
45529 |
+
[2021-11-29 23:32:39,651] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_2_mp_rank_01_optim_states.pt
|
45530 |
+
[2021-11-29 23:32:39,651] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_3_mp_rank_00_optim_states.pt
|
45531 |
+
[2021-11-29 23:32:39,651] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_5_mp_rank_01_optim_states.pt
|
45532 |
+
[2021-11-29 23:32:39,652] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_6_mp_rank_01_optim_states.pt
|
45533 |
+
[2021-11-29 23:32:39,653] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_9_mp_rank_00_optim_states.pt
|
45534 |
+
[2021-11-29 23:32:39,655] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_8_mp_rank_00_optim_states.pt
|
45535 |
+
[2021-11-29 23:32:39,656] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_17_mp_rank_00_optim_states.pt
|
45536 |
+
[2021-11-29 23:32:39,657] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_18_mp_rank_01_optim_states.pt
|
45537 |
+
[2021-11-29 23:32:39,658] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_28_mp_rank_01_optim_states.pt
|
45538 |
+
[2021-11-29 23:32:39,659] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_15_mp_rank_01_optim_states.pt
|
45539 |
+
[2021-11-29 23:32:39,660] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_20_mp_rank_01_optim_states.pt
|
45540 |
+
[2021-11-29 23:32:39,661] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_13_mp_rank_00_optim_states.pt
|
45541 |
+
[2021-11-29 23:32:39,666] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_2_mp_rank_00_optim_states.pt
|
45542 |
+
[2021-11-29 23:32:39,666] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_28_mp_rank_00_optim_states.pt
|
45543 |
+
[2021-11-29 23:32:39,666] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_14_mp_rank_01_optim_states.pt
|
45544 |
+
[2021-11-29 23:32:39,666] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_22_mp_rank_01_optim_states.pt
|
45545 |
+
[2021-11-29 23:32:39,668] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_22_mp_rank_00_optim_states.pt
|
45546 |
+
[2021-11-29 23:32:39,668] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_19_mp_rank_01_optim_states.pt
|
45547 |
+
[2021-11-29 23:32:39,669] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_11_mp_rank_01_optim_states.pt
|
45548 |
+
[2021-11-29 23:32:39,669] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_6_mp_rank_00_optim_states.pt
|
45549 |
+
[2021-11-29 23:32:39,669] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_16_mp_rank_01_optim_states.pt
|
45550 |
+
[2021-11-29 23:32:39,672] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_7_mp_rank_00_optim_states.pt
|
45551 |
+
[2021-11-29 23:32:39,673] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_25_mp_rank_01_optim_states.pt
|
45552 |
+
[2021-11-29 23:32:39,673] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_23_mp_rank_01_optim_states.pt
|
45553 |
+
[2021-11-29 23:32:39,679] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_25_mp_rank_00_optim_states.pt
|
45554 |
+
[2021-11-29 23:32:39,682] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_29_mp_rank_00_optim_states.pt
|
45555 |
+
[2021-11-29 23:32:39,682] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_10_mp_rank_01_optim_states.pt
|
45556 |
+
[2021-11-29 23:32:39,682] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_23_mp_rank_00_optim_states.pt
|
45557 |
+
[2021-11-29 23:32:39,683] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_30_mp_rank_01_optim_states.pt
|
45558 |
+
[2021-11-29 23:32:39,683] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_24_mp_rank_01_optim_states.pt
|
45559 |
+
[2021-11-29 23:32:39,685] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_10_mp_rank_00_optim_states.pt
|
45560 |
+
[2021-11-29 23:32:39,686] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_7_mp_rank_01_optim_states.pt
|
45561 |
+
[2021-11-29 23:32:39,687] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_4_mp_rank_01_optim_states.pt
|
45562 |
+
[2021-11-29 23:32:39,688] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_26_mp_rank_00_optim_states.pt
|
45563 |
+
[2021-11-29 23:32:39,689] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_19_mp_rank_00_optim_states.pt
|
45564 |
+
[2021-11-29 23:32:39,692] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_3_mp_rank_01_optim_states.pt
|
45565 |
+
[2021-11-29 23:32:39,692] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_31_mp_rank_01_optim_states.pt
|
45566 |
+
[2021-11-29 23:32:39,693] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_12_mp_rank_01_optim_states.pt
|
45567 |
+
[2021-11-29 23:32:39,693] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_5_mp_rank_00_optim_states.pt
|
45568 |
+
[2021-11-29 23:32:39,695] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_20_mp_rank_00_optim_states.pt
|
45569 |
+
[2021-11-29 23:32:39,695] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_16_mp_rank_00_optim_states.pt
|
45570 |
+
[2021-11-29 23:32:39,696] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_11_mp_rank_00_optim_states.pt
|
45571 |
+
[2021-11-29 23:32:39,696] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_17_mp_rank_01_optim_states.pt
|
45572 |
+
[2021-11-29 23:32:39,698] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_14_mp_rank_00_optim_states.pt
|
45573 |
+
[2021-11-29 23:32:39,701] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_0_mp_rank_01_optim_states.pt
|
45574 |
+
[2021-11-29 23:32:39,703] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_4_mp_rank_00_optim_states.pt
|
45575 |
+
[2021-11-29 23:32:39,704] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_12_mp_rank_00_optim_states.pt
|
45576 |
+
[2021-11-29 23:32:39,710] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_9_mp_rank_01_optim_states.pt
|
45577 |
+
[2021-11-29 23:32:39,714] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_8_mp_rank_01_optim_states.pt
|
45578 |
+
[2021-11-29 23:32:39,725] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_0_mp_rank_00_optim_states.pt
|
45579 |
+
successfully saved checkpoint at iteration 152972 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints
|
45580 |
+
------------------------------------------------------------------------------------------------------------
|
45581 |
+
test loss at the end of training for test data | lm loss value: 1.391494E+00 | lm loss PPL: 4.020852E+00 |
|
45582 |
+
------------------------------------------------------------------------------------------------------------
|