diff --git "a/wandb/run-20220322_163235-2yj5gh94/files/output.log" "b/wandb/run-20220322_163235-2yj5gh94/files/output.log" --- "a/wandb/run-20220322_163235-2yj5gh94/files/output.log" +++ "b/wandb/run-20220322_163235-2yj5gh94/files/output.log" @@ -14057,3 +14057,1471 @@ {'eval_loss': 0.309664785861969, 'eval_wer': 0.09321697738992463, 'eval_runtime': 583.4014, 'eval_samples_per_second': 4.529, 'eval_steps_per_second': 0.567, 'epoch': 4.48} [INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0722, 'learning_rate': 4.040462427745664e-05, 'epoch': 4.49} +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0647, 'learning_rate': 4.023121387283236e-05, 'epoch': 4.49} +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.096, 'learning_rate': 4.005780346820808e-05, 'epoch': 4.49} +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0915, 'learning_rate': 3.988439306358381e-05, 'epoch': 4.49} +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0799, 'learning_rate': 3.971098265895953e-05, 'epoch': 4.5} +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 23:14:54,855 >> Num examples = 2642██▊ | 1992/2230 [6:40:33<51:48, 13.06s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▍ | 2006/2230 [6:55:01<2:57:32, 47.56s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▍ | 2006/2230 [6:55:01<2:57:32, 47.56s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0812, 'learning_rate': 3.953757225433525e-05, 'epoch': 4.5} + 90%|███████████████████████████████████████████████████████████████████▍ | 2006/2230 [6:55:01<2:57:32, 47.56s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▍ | 2006/2230 [6:55:01<2:57:32, 47.56s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▍ | 2006/2230 [6:55:01<2:57:32, 47.56s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▍ | 2006/2230 [6:55:01<2:57:32, 47.56s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0717, 'learning_rate': 3.936416184971098e-05, 'epoch': 4.5} + g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0719, 'learning_rate': 3.91907514450867e-05, 'epoch': 4.5} + g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0865, 'learning_rate': 3.901734104046242e-05, 'epoch': 4.5} + g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▌ | 2010/2230 [6:55:52<1:17:21, 21.10s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▌ | 2010/2230 [6:55:52<1:17:21, 21.10s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0561, 'learning_rate': 3.884393063583814e-05, 'epoch': 4.51} + 90%|█████████████████████��█████████████████████████████████████████████▌ | 2010/2230 [6:55:52<1:17:21, 21.10s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▌ | 2010/2230 [6:55:52<1:17:21, 21.10s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▌ | 2010/2230 [6:55:52<1:17:21, 21.10s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▌ | 2010/2230 [6:55:52<1:17:21, 21.10s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▌ | 2010/2230 [6:55:52<1:17:21, 21.10s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▌ | 2010/2230 [6:55:52<1:17:21, 21.10s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.067, 'learning_rate': 3.867052023121387e-05, 'epoch': 4.51} + 90%|███████████████████████████████████████████████████████████████████▌ | 2010/2230 [6:55:52<1:17:21, 21.10s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▌ | 2010/2230 [6:55:52<1:17:21, 21.10s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▌ | 2010/2230 [6:55:52<1:17:21, 21.10s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▌ | 2010/2230 [6:55:52<1:17:21, 21.10s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 2012/2230 [6:56:17<1:00:09, 16.56s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 2012/2230 [6:56:17<1:00:09, 16.56s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0446, 'learning_rate': 3.849710982658959e-05, 'epoch': 4.51} + 90%|███████████████████████████████████████████████████████████████████▋ | 2012/2230 [6:56:17<1:00:09, 16.56s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 2012/2230 [6:56:17<1:00:09, 16.56s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 2012/2230 [6:56:17<1:00:09, 16.56s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 2012/2230 [6:56:17<1:00:09, 16.56s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 2012/2230 [6:56:17<1:00:09, 16.56s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 2012/2230 [6:56:17<1:00:09, 16.56s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.053, 'learning_rate': 3.832369942196531e-05, 'epoch': 4.51} + 90%|███████████████████████████████████████████████████████████████████▋ | 2012/2230 [6:56:17<1:00:09, 16.56s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 2012/2230 [6:56:17<1:00:09, 16.56s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 2012/2230 [6:56:17<1:00:09, 16.56s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 2012/2230 [6:56:17<1:00:09, 16.56s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 2012/2230 [6:56:17<1:00:09, 16.56s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 2012/2230 [6:56:17<1:00:09, 16.56s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0593, 'learning_rate': 3.815028901734104e-05, 'epoch': 4.52} + 90%|███████████████████████████████████████████████████████████████████▋ | 2012/2230 [6:56:17<1:00:09, 16.56s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 2012/2230 [6:56:17<1:00:09, 16.56s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 2012/2230 [6:56:17<1:00:09, 16.56s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 2012/2230 [6:56:17<1:00:09, 16.56s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 2012/2230 [6:56:17<1:00:09, 16.56s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|█████████████████████████████████████████████████████████████████████▌ | 2015/2230 [6:56:54<48:40, 13.59s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|█████████████████████████████████████████████████████████████████████▌ | 2015/2230 [6:56:54<48:40, 13.59s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|█████████████████████████████████████████████████████████████████████▌ | 2015/2230 [6:56:54<48:40, 13.59s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|█████████████████████████████████████████████████████████████████████▌ | 2015/2230 [6:56:54<48:40, 13.59s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|█████████████████████████████████████████████████████████████████████▌ | 2015/2230 [6:56:54<48:40, 13.59s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:29:41,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:29:41,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0629, 'learning_rate': 3.780346820809248e-05, 'epoch': 4.52} +[WARNING|modeling_utils.py:388] 2022-03-22 23:29:41,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:29:41,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:29:41,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:29:41,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:29:41,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|█████████████████████████████████████████████████████████████████████▋ | 2017/2230 [6:57:16<44:03, 12.41s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|█████████████████████████████████████████████████████████████████████▋ | 2017/2230 [6:57:16<44:03, 12.41s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|█████████████████████████████████████████████████████████████████████▋ | 2017/2230 [6:57:16<44:03, 12.41s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|█████████████████████████████████████████████████████████████████████▋ | 2017/2230 [6:57:16<44:03, 12.41s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|█████████████████████████████████████████████████████████████████████▋ | 2017/2230 [6:57:16<44:03, 12.41s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|█████████████████████████████████████████████████████████████████████▋ | 2017/2230 [6:57:16<44:03, 12.41s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|█████████████████████████████████████████████████████████████████████▋ | 2017/2230 [6:57:16<44:03, 12.41s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0547, 'learning_rate': 3.745664739884393e-05, 'epoch': 4.52} + 90%|█████████████████████████████████████████████████████████████████████▋ | 2017/2230 [6:57:16<44:03, 12.41s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|█████████████████████████████████████████████████████████████████████▋ | 2017/2230 [6:57:16<44:03, 12.41s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:30:12,298 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:30:12,298 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:30:12,298 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0735, 'learning_rate': 3.728323699421965e-05, 'epoch': 4.53} +[WARNING|modeling_utils.py:388] 2022-03-22 23:30:12,298 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:30:12,298 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:30:22,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:30:22,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:30:22,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0501, 'learning_rate': 3.710982658959537e-05, 'epoch': 4.53} +[WARNING|modeling_bart.py:1051] 2022-03-22 23:30:29,023 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:30:29,023 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:30:32,876 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:30:32,876 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:30:32,876 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:30:32,876 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0683, 'learning_rate': 3.6936416184971096e-05, 'epoch': 4.53} +[WARNING|modeling_bart.py:1051] 2022-03-22 23:30:32,876 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:30:32,876 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:30:44,748 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:30:44,748 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0832, 'learning_rate': 3.6763005780346816e-05, 'epoch': 4.53} +[WARNING|modeling_utils.py:388] 2022-03-22 23:30:44,748 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:30:50,974 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:30:50,974 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:30:55,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:30:55,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0498, 'learning_rate': 3.658959537572254e-05, 'epoch': 4.54} +[WARNING|modeling_bart.py:1051] 2022-03-22 23:30:55,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:31:01,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:31:03,546 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|█████████████████████████████████████████████████████████████████████▉ | 2024/2230 [6:58:28<34:44, 10.12s/it] Setting `use_cache=False`...e computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|█████████████████████████████████████████████████████████████████████▉ | 2024/2230 [6:58:28<34:44, 10.12s/it] Setting `use_cache=False`...e computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:07,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:09,589 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:11,795 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:11,795 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:11,795 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:11,795 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:17,946 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:20,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:21,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:23,946 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:23,946 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:25,966 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:27,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:29,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:31,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:31,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:33,525 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:35,266 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:36,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:38,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:38,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:41,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:43,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:44,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:44,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:47,913 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:49,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:52,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:52,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:53,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:55,578 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:55,578 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:57,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:59,819 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:31:59,819 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:01,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:03,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:03,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:06,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:06,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:06,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1192, 'learning_rate': 3.4682080924855485e-05, 'epoch': 4.56} +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:10,794 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:14,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:14,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:18,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:18,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:21,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:21,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1441, 'learning_rate': 3.450867052023121e-05, 'epoch': 4.56} +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:25,238 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:28,825 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:28,825 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:32,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:32,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:35,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:35,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1257, 'learning_rate': 3.433526011560693e-05, 'epoch': 4.57} +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:39,363 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:42,830 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:42,830 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:46,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:46,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:49,680 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:49,680 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:53,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:53,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:53,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:53,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:53,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:53,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:32:53,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1209, 'learning_rate': 3.38150289017341e-05, 'epoch': 4.57} + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████���███████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0767, 'learning_rate': 3.364161849710982e-05, 'epoch': 4.57} + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0638, 'learning_rate': 3.346820809248554e-05, 'epoch': 4.58} + 91%|████████████���█████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0763, 'learning_rate': 3.329479768786127e-05, 'epoch': 4.58} + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████��███████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1027, 'learning_rate': 3.312138728323699e-05, 'epoch': 4.58} + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0879, 'learning_rate': 3.294797687861271e-05, 'epoch': 4.58} + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|████████████████████████████████████████████████��█████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.072, 'learning_rate': 3.277456647398844e-05, 'epoch': 4.59} + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0753, 'learning_rate': 3.260115606936416e-05, 'epoch': 4.59} + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0689, 'learning_rate': 3.242774566473988e-05, 'epoch': 4.59} + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.071, 'learning_rate': 3.22543352601156e-05, 'epoch': 4.59} + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|█████████████████████████████████████████████████████████████��████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0758, 'learning_rate': 3.208092485549133e-05, 'epoch': 4.59} + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0574, 'learning_rate': 3.190751445086705e-05, 'epoch': 4.6} + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0876, 'learning_rate': 3.173410404624277e-05, 'epoch': 4.6} + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 2038/2230 [7:00:29<39:34, 12.37s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▊ | 2052/2230 [7:03:35<38:48, 13.08s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▊ | 2052/2230 [7:03:35<38:48, 13.08s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▊ | 2052/2230 [7:03:35<38:48, 13.08s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▊ | 2052/2230 [7:03:35<38:48, 13.08s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▊ | 2052/2230 [7:03:35<38:48, 13.08s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▊ | 2052/2230 [7:03:35<38:48, 13.08s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▊ | 2052/2230 [7:03:35<38:48, 13.08s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2053/2230 [7:03:47<38:01, 12.89s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2053/2230 [7:03:47<38:01, 12.89s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2053/2230 [7:03:47<38:01, 12.89s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2053/2230 [7:03:47<38:01, 12.89s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2053/2230 [7:03:47<38:01, 12.89s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2053/2230 [7:03:47<38:01, 12.89s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2053/2230 [7:03:47<38:01, 12.89s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2054/2230 [7:04:00<37:22, 12.74s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|███████████████��██████████████████████████████████████████████████████▉ | 2054/2230 [7:04:00<37:22, 12.74s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2054/2230 [7:04:00<37:22, 12.74s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2054/2230 [7:04:00<37:22, 12.74s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2054/2230 [7:04:00<37:22, 12.74s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2054/2230 [7:04:00<37:22, 12.74s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2055/2230 [7:04:12<36:48, 12.62s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2055/2230 [7:04:12<36:48, 12.62s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.067, 'learning_rate': 3.1040462427745667e-05, 'epoch': 4.61} + 92%|██████████████████████████████████████████████████████████████████████▉ | 2055/2230 [7:04:12<36:48, 12.62s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2055/2230 [7:04:12<36:48, 12.62s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2055/2230 [7:04:12<36:48, 12.62s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2055/2230 [7:04:12<36:48, 12.62s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0614, 'learning_rate': 3.086705202312139e-05, 'epoch': 4.61} + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0644, 'learning_rate': 3.069364161849711e-05, 'epoch': 4.61} + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0702, 'learning_rate': 3.052023121387283e-05, 'epoch': 4.61} + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0629, 'learning_rate': 3.0346820809248553e-05, 'epoch': 4.62} + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0638, 'learning_rate': 3.0173410404624277e-05, 'epoch': 4.62} + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0759, 'learning_rate': 2.9999999999999997e-05, 'epoch': 4.62} + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 2056/2230 [7:04:24<36:11, 12.48s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|███████████████████████████████████████████████████████████████████████▏ | 2062/2230 [7:05:36<33:30, 11.97s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|███████████████████████████████████████████████████████████████████████▏ | 2062/2230 [7:05:36<33:30, 11.97s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0786, 'learning_rate': 2.982658959537572e-05, 'epoch': 4.62} + 92%|█████████████████████████████████████���█████████████████████████████████▏ | 2062/2230 [7:05:36<33:30, 11.97s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|███████████████████████████████████████████████████████████████████████▏ | 2062/2230 [7:05:36<33:30, 11.97s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|███████████████████████████████████████████████████████████████████████▏ | 2062/2230 [7:05:36<33:30, 11.97s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|███████████████████████████████████████████████████████████████████████▏ | 2062/2230 [7:05:36<33:30, 11.97s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|███████████████████████████████████████████████████████████████████████▏ | 2062/2230 [7:05:36<33:30, 11.97s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|███████████████████████████████████████████████████████████████████████▏ | 2062/2230 [7:05:36<33:30, 11.97s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|███████████████████████████████████████████████████████████████████████▏ | 2062/2230 [7:05:36<33:30, 11.97s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0659, 'learning_rate': 2.9653179190751446e-05, 'epoch': 4.63} + 92%|███████████████████████████████████████████████████████████████████████▏ | 2062/2230 [7:05:36<33:30, 11.97s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|███████████████████████████████████████████████████████████████████████▏ | 2062/2230 [7:05:36<33:30, 11.97s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|███████████████████████████████████████████████████████████████████████▏ | 2062/2230 [7:05:36<33:30, 11.97s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|███████████████████████████████████████████████████████████████████████▏ | 2062/2230 [7:05:36<33:30, 11.97s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|███████████████████████████████████████████████████████████████████████▏ | 2062/2230 [7:05:36<33:30, 11.97s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:38:40,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:38:40,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:38:40,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:38:46,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:38:46,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:38:46,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0729, 'learning_rate': 2.930635838150289e-05, 'epoch': 4.63} +[WARNING|modeling_utils.py:388] 2022-03-22 23:38:46,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:38:46,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:38:46,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:38:46,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▎ | 2066/2230 [7:06:24<31:56, 11.69s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▎ | 2066/2230 [7:06:24<31:56, 11.69s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0761, 'learning_rate': 2.9132947976878608e-05, 'epoch': 4.63} + 93%|███████████████████████████████████████████████████████████████████████▎ | 2066/2230 [7:06:24<31:56, 11.69s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▎ | 2066/2230 [7:06:24<31:56, 11.69s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▎ | 2066/2230 [7:06:24<31:56, 11.69s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▎ | 2066/2230 [7:06:24<31:56, 11.69s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▎ | 2066/2230 [7:06:24<31:56, 11.69s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0656, 'learning_rate': 2.895953757225433e-05, 'epoch': 4.63} + 93%|███████████████████████████████████████████████████████████████████████▎ | 2066/2230 [7:06:24<31:56, 11.69s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▎ | 2066/2230 [7:06:24<31:56, 11.69s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▎ | 2066/2230 [7:06:24<31:56, 11.69s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:39:21,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:39:21,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:39:21,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0555, 'learning_rate': 2.8786127167630052e-05, 'epoch': 4.64} +[WARNING|modeling_utils.py:388] 2022-03-22 23:39:21,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:39:21,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:39:31,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:39:31,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:39:31,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0612, 'learning_rate': 2.8612716763005776e-05, 'epoch': 4.64} +[WARNING|modeling_utils.py:388] 2022-03-22 23:39:31,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:39:31,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:39:31,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:39:43,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:39:43,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0519, 'learning_rate': 2.8439306358381497e-05, 'epoch': 4.64} +[WARNING|modeling_bart.py:1051] 2022-03-22 23:39:48,060 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:39:48,060 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:39:48,060 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:39:53,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:39:53,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0574, 'learning_rate': 2.826589595375722e-05, 'epoch': 4.64} +[WARNING|modeling_utils.py:388] 2022-03-22 23:39:53,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:39:59,813 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:39:59,813 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▌ | 2072/2230 [7:07:26<27:27, 10.43s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▌ | 2072/2230 [7:07:26<27:27, 10.43s/it]g-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:40:05,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:40:05,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:40:10,355 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:40:10,355 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:40:14,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:40:14,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0728, 'learning_rate': 2.7919075144508666e-05, 'epoch': 4.65} +[WARNING|modeling_utils.py:388] 2022-03-22 23:40:14,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:40:20,265 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:40:22,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:40:22,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0571, 'learning_rate': 2.774566473988439e-05, 'epoch': 4.65} +[WARNING|modeling_utils.py:388] 2022-03-22 23:40:26,027 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:40:28,220 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:40:28,220 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:40:28,220 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:40:28,220 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0653, 'learning_rate': 2.757225433526011e-05, 'epoch': 4.65} +[WARNING|modeling_utils.py:388] 2022-03-22 23:40:28,220 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:40:38,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:40:40,290 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:10:18,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▋ | 2076/2230 [7:08:05<24:47, 9.66s/it][WARNING|modeling_bart.py:1051] 2022-03-22 23:40:42,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▋ | 2076/2230 [7:08:05<24:47, 9.66s/it][WARNING|modeling_bart.py:1051] 2022-03-22 23:40:42,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:40:44,323 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:40:42,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:40:46,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:40:42,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:40:48,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:40:42,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▋ | 2077/2230 [7:08:12<23:09, 9.08s/it][WARNING|modeling_bart.py:1051] 2022-03-22 23:40:50,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▋ | 2077/2230 [7:08:12<23:09, 9.08s/it][WARNING|modeling_bart.py:1051] 2022-03-22 23:40:50,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:40:51,899 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:40:50,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:40:53,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:40:50,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:40:55,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:40:50,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:40:55,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:40:50,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▊ | 2078/2230 [7:08:20<21:37, 8.54s/it][WARNING|modeling_bart.py:1051] 2022-03-22 23:40:57,252 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:41:00,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:40:57,252 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:41:02,166 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:40:57,252 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:41:02,166 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:40:57,252 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▊ | 2079/2230 [7:08:26<20:01, 7.95s/it][WARNING|modeling_bart.py:1051] 2022-03-22 23:41:03,824 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:41:05,339 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:41:03,824 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:41:08,309 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:41:03,824 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:41:08,309 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:41:03,824 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▊ | 2080/2230 [7:08:32<18:26, 7.38s/it][WARNING|modeling_bart.py:1051] 2022-03-22 23:41:09,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:41:12,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:41:09,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████���██████████████████████████████████████████▊ | 2081/2230 [7:08:38<16:42, 6.73s/it][WARNING|modeling_bart.py:1051] 2022-03-22 23:41:14,912 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▊ | 2081/2230 [7:08:38<16:42, 6.73s/it][WARNING|modeling_bart.py:1051] 2022-03-22 23:41:14,912 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:41:16,067 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:41:14,912 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▉ | 2082/2230 [7:08:42<14:54, 6.05s/it] Setting `use_cache=False`...1] 2022-03-22 23:41:14,912 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▉ | 2082/2230 [7:08:42<14:54, 6.05s/it] Setting `use_cache=False`...1] 2022-03-22 23:41:14,912 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:41:20,310 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:41:19,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▉ | 2083/2230 [7:08:46<13:15, 5.41s/it] Setting `use_cache=False`...1] 2022-03-22 23:41:19,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▉ | 2083/2230 [7:08:46<13:15, 5.41s/it] Setting `use_cache=False`...1] 2022-03-22 23:41:19,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:41:24,899 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:41:23,201 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▉ | 2084/2230 [7:08:49<11:40, 4.80s/it] Setting `use_cache=False`...1] 2022-03-22 23:41:23,201 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▉ | 2084/2230 [7:08:49<11:40, 4.80s/it] Setting `use_cache=False`...1] 2022-03-22 23:41:23,201 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▉ | 2084/2230 [7:08:49<11:40, 4.80s/it][WARNING|modeling_bart.py:1051] 2022-03-22 23:41:27,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:41:31,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:41:27,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:41:31,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:41:27,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:41:34,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:41:27,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:41:34,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:41:27,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:41:38,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:41:27,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▉ | 2085/2230 [7:09:04<18:33, 7.68s/it] Setting `use_cache=False`...1] 2022-03-22 23:41:27,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▉ | 2085/2230 [7:09:04<18:33, 7.68s/it] Setting `use_cache=False`...1] 2022-03-22 23:41:27,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▉ | 2085/2230 [7:09:04<18:33, 7.68s/it][WARNING|modeling_bart.py:1051] 2022-03-22 23:41:41,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▉ | 2085/2230 [7:09:04<18:33, 7.68s/it][WARNING|modeling_bart.py:1051] 2022-03-22 23:41:41,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:41:45,337 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:41:41,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:41:48,778 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:41:41,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:41:48,778 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:41:41,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:41:52,281 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:41:41,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████ | 2086/2230 [7:09:18<22:58, 9.57s/it] Setting `use_cache=False`...1] 2022-03-22 23:41:41,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████ | 2086/2230 [7:09:18<22:58, 9.57s/it] Setting `use_cache=False`...1] 2022-03-22 23:41:41,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████ | 2086/2230 [7:09:18<22:58, 9.57s/it][WARNING|modeling_bart.py:1051] 2022-03-22 23:41:55,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:41:59,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:41:55,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:41:59,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:41:55,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:42:02,812 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:41:55,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:42:02,812 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:41:55,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:42:06,342 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:41:55,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████ | 2087/2230 [7:09:32<26:01, 10.92s/it] Setting `use_cache=False`...1] 2022-03-22 23:41:55,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████ | 2087/2230 [7:09:32<26:01, 10.92s/it] Setting `use_cache=False`...1] 2022-03-22 23:41:55,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████ | 2087/2230 [7:09:32<26:01, 10.92s/it][WARNING|modeling_bart.py:1051] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:42:13,275 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:42:13,275 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:42:13,275 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:42:13,275 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:42:13,275 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:42:13,275 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:42:13,275 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:42:13,275 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0872, 'learning_rate': 2.5317919075144507e-05, 'epoch': 4.68} +[WARNING|modeling_bart.py:1051] 2022-03-22 23:42:13,275 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:42:13,275 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:42:13,275 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:42:13,275 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1099, 'learning_rate': 2.514450867052023e-05, 'epoch': 4.68} + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.094, 'learning_rate': 2.497109826589595e-05, 'epoch': 4.69} + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1113, 'learning_rate': 2.4797687861271675e-05, 'epoch': 4.69} + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0828, 'learning_rate': 2.4624277456647396e-05, 'epoch': 4.69} + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0799, 'learning_rate': 2.445086705202312e-05, 'epoch': 4.69} + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0837, 'learning_rate': 2.427745664739884e-05, 'epoch': 4.7} + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0823, 'learning_rate': 2.4104046242774565e-05, 'epoch': 4.7} + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0878, 'learning_rate': 2.393063583815029e-05, 'epoch': 4.7} + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 2089/2230 [7:10:01<29:52, 12.71s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0632, 'learning_rate': 2.375722543352601e-05, 'epoch': 4.7} + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0509, 'learning_rate': 2.3583815028901734e-05, 'epoch': 4.7} + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0624, 'learning_rate': 2.3410404624277454e-05, 'epoch': 4.71} + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0719, 'learning_rate': 2.323699421965318e-05, 'epoch': 4.71} + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0763, 'learning_rate': 2.30635838150289e-05, 'epoch': 4.71} + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 2097/2230 [7:11:47<29:04, 13.12s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2102/2230 [7:12:53<27:52, 13.07s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2102/2230 [7:12:53<27:52, 13.07s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0815, 'learning_rate': 2.2890173410404623e-05, 'epoch': 4.71} + 94%|████████████████████████████████████████████████████████████████████████▌ | 2102/2230 [7:12:53<27:52, 13.07s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2102/2230 [7:12:53<27:52, 13.07s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2102/2230 [7:12:53<27:52, 13.07s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2102/2230 [7:12:53<27:52, 13.07s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0548, 'learning_rate': 2.2716763005780347e-05, 'epoch': 4.72} + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0742, 'learning_rate': 2.2543352601156068e-05, 'epoch': 4.72} + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0509, 'learning_rate': 2.2369942196531792e-05, 'epoch': 4.72} + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0468, 'learning_rate': 2.2196531791907513e-05, 'epoch': 4.72} + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0499, 'learning_rate': 2.2023121387283237e-05, 'epoch': 4.72} + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0682, 'learning_rate': 2.184971098265896e-05, 'epoch': 4.73} + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 2103/2230 [7:13:05<27:16, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▊ | 2109/2230 [7:14:19<24:45, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▊ | 2109/2230 [7:14:19<24:45, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.054, 'learning_rate': 2.167630057803468e-05, 'epoch': 4.73} + 95%|████████████████████████████████████████████████████████████████████████▊ | 2109/2230 [7:14:19<24:45, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▊ | 2109/2230 [7:14:19<24:45, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▊ | 2109/2230 [7:14:19<24:45, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▊ | 2109/2230 [7:14:19<24:45, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▊ | 2109/2230 [7:14:19<24:45, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▊ | 2109/2230 [7:14:19<24:45, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0559, 'learning_rate': 2.1502890173410405e-05, 'epoch': 4.73} + 95%|████████████████████████████████████████████████████████████████████████▊ | 2109/2230 [7:14:19<24:45, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▊ | 2109/2230 [7:14:19<24:45, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▊ | 2109/2230 [7:14:19<24:45, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▊ | 2109/2230 [7:14:19<24:45, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▊ | 2109/2230 [7:14:19<24:45, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0495, 'learning_rate': 2.1329479768786126e-05, 'epoch': 4.73} + 95%|████████████████████████████████████████████████████████████████████████▊ | 2109/2230 [7:14:19<24:45, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▊ | 2109/2230 [7:14:19<24:45, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▊ | 2109/2230 [7:14:19<24:45, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████���███████████████▊ | 2109/2230 [7:14:19<24:45, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▊ | 2109/2230 [7:14:19<24:45, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▊ | 2109/2230 [7:14:19<24:45, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0674, 'learning_rate': 2.115606936416185e-05, 'epoch': 4.74} + 95%|████████████████████████████████████████████████████████████████████████▊ | 2109/2230 [7:14:19<24:45, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▊ | 2109/2230 [7:14:19<24:45, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▊ | 2109/2230 [7:14:19<24:45, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▊ | 2109/2230 [7:14:19<24:45, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▉ | 2113/2230 [7:15:08<24:18, 12.46s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▉ | 2113/2230 [7:15:08<24:18, 12.46s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0673, 'learning_rate': 2.098265895953757e-05, 'epoch': 4.74} + 95%|████████████████████████████████████████████████████████████████████████▉ | 2113/2230 [7:15:08<24:18, 12.46s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▉ | 2113/2230 [7:15:08<24:18, 12.46s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▉ | 2113/2230 [7:15:08<24:18, 12.46s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|██████████████████████████████████████████████���█████████████████████████▉ | 2113/2230 [7:15:08<24:18, 12.46s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▉ | 2113/2230 [7:15:08<24:18, 12.46s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▉ | 2113/2230 [7:15:08<24:18, 12.46s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0707, 'learning_rate': 2.080924855491329e-05, 'epoch': 4.74} + 95%|████████████████████████████████████████████████████████████████████████▉ | 2113/2230 [7:15:08<24:18, 12.46s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▉ | 2113/2230 [7:15:08<24:18, 12.46s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▉ | 2113/2230 [7:15:08<24:18, 12.46s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▉ | 2113/2230 [7:15:08<24:18, 12.46s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▉ | 2113/2230 [7:15:08<24:18, 12.46s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0635, 'learning_rate': 2.0635838150289012e-05, 'epoch': 4.74} + 95%|████████████████████████████████████████████████████████████████████████▉ | 2113/2230 [7:15:08<24:18, 12.46s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▉ | 2113/2230 [7:15:08<24:18, 12.46s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▉ | 2113/2230 [7:15:08<24:18, 12.46s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▉ | 2113/2230 [7:15:08<24:18, 12.46s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▉ | 2113/2230 [7:15:08<24:18, 12.46s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▉ | 2113/2230 [7:15:08<24:18, 12.46s/it] Setting `use_cache=False`...1] 2022-03-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:48:22,028 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:48:22,028 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:48:26,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:48:26,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:48:30,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:48:30,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0424, 'learning_rate': 2.028901734104046e-05, 'epoch': 4.75} +[WARNING|modeling_utils.py:388] 2022-03-22 23:48:34,365 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:48:34,365 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:48:38,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:48:38,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:48:38,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:48:38,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0546, 'learning_rate': 2.011560693641618e-05, 'epoch': 4.75} +[WARNING|modeling_utils.py:388] 2022-03-22 23:48:38,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:48:38,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:48:38,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:48:38,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0524, 'learning_rate': 1.9942196531791905e-05, 'epoch': 4.75} +[WARNING|modeling_utils.py:388] 2022-03-22 23:48:38,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:48:38,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:48:58,250 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:49:00,813 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:49:00,813 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0984, 'learning_rate': 1.9768786127167626e-05, 'epoch': 4.75} +[WARNING|modeling_utils.py:388] 2022-03-22 23:49:04,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:49:04,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:49:04,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:49:04,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:49:04,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:42:09,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|█████████████████████████████████████████████████████████████████████████▏ | 2121/2230 [7:16:35<19:25, 10.69s/it][WARNING|modeling_bart.py:1051] 2022-03-22 23:49:13,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|█████████████████████████████████████████████████████████████████████████▏ | 2121/2230 [7:16:35<19:25, 10.69s/it][WARNING|modeling_bart.py:1051] 2022-03-22 23:49:13,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|█████████████████████████████████████████████████████████████████████████▏ | 2121/2230 [7:16:35<19:25, 10.69s/it][WARNING|modeling_bart.py:1051] 2022-03-22 23:49:13,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:49:19,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:13,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:49:19,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:13,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:49:19,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:13,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0632, 'learning_rate': 1.942196531791907e-05, 'epoch': 4.76} +[WARNING|modeling_bart.py:1051] 2022-03-22 23:49:19,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:13,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:49:27,131 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:13,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:49:27,131 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:13,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:49:31,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:13,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:49:31,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:13,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0498, 'learning_rate': 1.9248554913294795e-05, 'epoch': 4.76} +[WARNING|modeling_utils.py:388] 2022-03-22 23:49:35,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:13,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:49:35,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:13,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:49:39,593 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:13,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|█████████████████████████████████████████████████████████████████████████▎ | 2124/2230 [7:17:04<17:33, 9.94s/it][WARNING|modeling_bart.py:1051] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|█████████████████████████████████████████████████████████████████████████▎ | 2124/2230 [7:17:04<17:33, 9.94s/it][WARNING|modeling_bart.py:1051] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0575, 'learning_rate': 1.907514450867052e-05, 'epoch': 4.76} + 95%|█████████████████████████████████████████████████████████████████████████▎ | 2124/2230 [7:17:04<17:33, 9.94s/it][WARNING|modeling_bart.py:1051] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:49:47,452 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:49:49,561 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:49:49,561 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:49:49,561 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.049, 'learning_rate': 1.890173410404624e-05, 'epoch': 4.76} +[WARNING|modeling_bart.py:1051] 2022-03-22 23:49:55,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:49:57,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:49:59,679 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:49:59,679 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:01,734 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:03,595 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:05,441 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:07,275 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:07,275 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:09,129 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:10,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:12,603 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:12,603 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:15,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:17,611 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:19,149 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:19,149 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:22,189 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:23,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:26,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:26,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:27,608 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:30,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:30,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:32,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:34,568 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:34,568 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:35,601 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:38,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:38,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:40,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:41,995 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:41,995 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:42,721 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:46,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:46,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:49,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:49,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:56,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:50:56,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1591, 'learning_rate': 1.7167630057803466e-05, 'epoch': 4.79} +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:00,364 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:00,364 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:03,918 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:07,358 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:07,358 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:10,807 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:10,807 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1144, 'learning_rate': 1.699421965317919e-05, 'epoch': 4.79} +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:17,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:21,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:21,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:24,679 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:24,679 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1283, 'learning_rate': 1.682080924855491e-05, 'epoch': 4.79} +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:28,149 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:31,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:31,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.102, 'learning_rate': 1.6647398843930635e-05, 'epoch': 4.79} +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0936, 'learning_rate': 1.6473988439306356e-05, 'epoch': 4.8} +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0992, 'learning_rate': 1.630057803468208e-05, 'epoch': 4.8} +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0878, 'learning_rate': 1.61271676300578e-05, 'epoch': 4.8} +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:51:36,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0724, 'learning_rate': 1.5953757225433525e-05, 'epoch': 4.8} + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████���███████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0741, 'learning_rate': 1.578034682080925e-05, 'epoch': 4.8} + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0743, 'learning_rate': 1.560693641618497e-05, 'epoch': 4.81} + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0643, 'learning_rate': 1.5433526011560694e-05, 'epoch': 4.81} + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.087, 'learning_rate': 1.5260115606936414e-05, 'epoch': 4.81} + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0482, 'learning_rate': 1.5086705202312138e-05, 'epoch': 4.81} + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0634, 'learning_rate': 1.491329479768786e-05, 'epoch': 4.82} + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0908, 'learning_rate': 1.4739884393063583e-05, 'epoch': 4.82} + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0669, 'learning_rate': 1.4566473988439304e-05, 'epoch': 4.82} + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0904, 'learning_rate': 1.4393063583815026e-05, 'epoch': 4.82} + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0609, 'learning_rate': 1.4219653179190749e-05, 'epoch': 4.83} + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2142/2230 [7:19:58<19:18, 13.17s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▎ | 2153/2230 [7:22:21<16:20, 12.74s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▎ | 2153/2230 [7:22:21<16:20, 12.74s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0588, 'learning_rate': 1.4046242774566473e-05, 'epoch': 4.83} + 97%|██████████████████████████████████████████████████████████████████████████▎ | 2153/2230 [7:22:21<16:20, 12.74s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▎ | 2153/2230 [7:22:21<16:20, 12.74s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▎ | 2153/2230 [7:22:21<16:20, 12.74s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▎ | 2153/2230 [7:22:21<16:20, 12.74s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|███████████████████████████████████████████████████████████████���██████████▍ | 2154/2230 [7:22:33<15:58, 12.62s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2154/2230 [7:22:33<15:58, 12.62s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.082, 'learning_rate': 1.3872832369942195e-05, 'epoch': 4.83} + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2154/2230 [7:22:33<15:58, 12.62s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2154/2230 [7:22:33<15:58, 12.62s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2154/2230 [7:22:33<15:58, 12.62s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2154/2230 [7:22:33<15:58, 12.62s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2155/2230 [7:22:45<15:36, 12.48s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2155/2230 [7:22:45<15:36, 12.48s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0537, 'learning_rate': 1.3699421965317917e-05, 'epoch': 4.83} + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2155/2230 [7:22:45<15:36, 12.48s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2155/2230 [7:22:45<15:36, 12.48s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2155/2230 [7:22:45<15:36, 12.48s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2155/2230 [7:22:45<15:36, 12.48s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2155/2230 [7:22:45<15:36, 12.48s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0742, 'learning_rate': 1.352601156069364e-05, 'epoch': 4.83} + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2155/2230 [7:22:45<15:36, 12.48s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2155/2230 [7:22:45<15:36, 12.48s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2155/2230 [7:22:45<15:36, 12.48s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2155/2230 [7:22:45<15:36, 12.48s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2155/2230 [7:22:45<15:36, 12.48s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2155/2230 [7:22:45<15:36, 12.48s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:23:10<14:58, 12.31s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:23:10<14:58, 12.31s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:23:10<14:58, 12.31s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:23:10<14:58, 12.31s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:23:10<14:58, 12.31s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|███████████████████████████████████���██████████████████████████████████████▍ | 2157/2230 [7:23:10<14:58, 12.31s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:23:10<14:58, 12.31s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0743, 'learning_rate': 1.3179190751445084e-05, 'epoch': 4.84} + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:23:10<14:58, 12.31s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:23:10<14:58, 12.31s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:23:10<14:58, 12.31s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:23:10<14:58, 12.31s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:23:10<14:58, 12.31s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:23:10<14:58, 12.31s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0412, 'learning_rate': 1.3005780346820809e-05, 'epoch': 4.84} + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:23:10<14:58, 12.31s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:23:10<14:58, 12.31s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:23:10<14:58, 12.31s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:23:10<14:58, 12.31s/it] Setting `use_cache=False`...1] 2022-03-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:56:22,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:56:22,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0736, 'learning_rate': 1.2832369942196531e-05, 'epoch': 4.84} +[WARNING|modeling_utils.py:388] 2022-03-22 23:56:22,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:56:22,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:56:22,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:56:22,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▌ | 2161/2230 [7:23:57<13:44, 11.94s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▌ | 2161/2230 [7:23:57<13:44, 11.94s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0472, 'learning_rate': 1.2658959537572253e-05, 'epoch': 4.85} + 97%|██████████████████████████████████████████████████████████████████████████▌ | 2161/2230 [7:23:57<13:44, 11.94s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▌ | 2161/2230 [7:23:57<13:44, 11.94s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▌ | 2161/2230 [7:23:57<13:44, 11.94s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▌ | 2161/2230 [7:23:57<13:44, 11.94s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▌ | 2161/2230 [7:23:57<13:44, 11.94s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▌ | 2161/2230 [7:23:57<13:44, 11.94s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0592, 'learning_rate': 1.2485549132947976e-05, 'epoch': 4.85} + 97%|██████████████████████████████████████████████████████████████████████████▌ | 2161/2230 [7:23:57<13:44, 11.94s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▌ | 2161/2230 [7:23:57<13:44, 11.94s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▌ | 2161/2230 [7:23:57<13:44, 11.94s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▌ | 2161/2230 [7:23:57<13:44, 11.94s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▌ | 2161/2230 [7:23:57<13:44, 11.94s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▌ | 2161/2230 [7:23:57<13:44, 11.94s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.066, 'learning_rate': 1.2312138728323698e-05, 'epoch': 4.85} + 97%|██████████████████████████████████████████████████████████████████████████▌ | 2161/2230 [7:23:57<13:44, 11.94s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▌ | 2161/2230 [7:23:57<13:44, 11.94s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▌ | 2161/2230 [7:23:57<13:44, 11.94s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▌ | 2161/2230 [7:23:57<13:44, 11.94s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▋ | 2164/2230 [7:24:34<13:19, 12.11s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▋ | 2164/2230 [7:24:34<13:19, 12.11s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.061, 'learning_rate': 1.213872832369942e-05, 'epoch': 4.85} + 97%|██████████████████████████████████████████████████████████████████████████▋ | 2164/2230 [7:24:34<13:19, 12.11s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▋ | 2164/2230 [7:24:34<13:19, 12.11s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▋ | 2164/2230 [7:24:34<13:19, 12.11s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▋ | 2164/2230 [7:24:34<13:19, 12.11s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▋ | 2164/2230 [7:24:34<13:19, 12.11s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▋ | 2164/2230 [7:24:34<13:19, 12.11s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0519, 'learning_rate': 1.1965317919075144e-05, 'epoch': 4.85} +[WARNING|modeling_utils.py:388] 2022-03-22 23:57:27,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:57:27,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:57:27,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▊ | 2166/2230 [7:24:56<12:27, 11.69s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▊ | 2166/2230 [7:24:56<12:27, 11.69s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0337, 'learning_rate': 1.1791907514450867e-05, 'epoch': 4.86} + 97%|██████████████████████████████████████████████████████████████████████████▊ | 2166/2230 [7:24:56<12:27, 11.69s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:57:39,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:57:39,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:57:39,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:57:39,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:57:39,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0496, 'learning_rate': 1.161849710982659e-05, 'epoch': 4.86} +[WARNING|modeling_bart.py:1051] 2022-03-22 23:57:39,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:57:39,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:57:39,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:57:39,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:57:39,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0457, 'learning_rate': 1.1445086705202312e-05, 'epoch': 4.86} +[WARNING|modeling_bart.py:1051] 2022-03-22 23:57:39,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:57:39,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:58:03,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:58:03,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:58:03,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:58:07,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:58:07,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:58:07,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:58:14,357 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:58:14,357 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:58:14,357 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.085, 'learning_rate': 1.1098265895953756e-05, 'epoch': 4.87} +[WARNING|modeling_bart.py:1051] 2022-03-22 23:58:20,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:58:20,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 23:58:20,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [7:25:49<10:25, 10.61s/it] Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [7:25:49<10:25, 10.61s/it] Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:58:28,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:58:28,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:58:28,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:58:34,398 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:58:34,398 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0437, 'learning_rate': 1.0751445086705203e-05, 'epoch': 4.87} +[WARNING|modeling_utils.py:388] 2022-03-22 23:58:34,398 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:58:40,420 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:58:42,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:58:42,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:58:42,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0581, 'learning_rate': 1.0578034682080925e-05, 'epoch': 4.87} +[WARNING|modeling_utils.py:388] 2022-03-22 23:58:48,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:58:50,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:58:53,039 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:58:53,039 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0467, 'learning_rate': 1.0404624277456646e-05, 'epoch': 4.87} +[WARNING|modeling_utils.py:388] 2022-03-22 23:58:53,039 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:58:58,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:00,677 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:00,677 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:00,677 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0702, 'learning_rate': 1.0231213872832368e-05, 'epoch': 4.88} +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:06,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:08,555 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:10,508 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:10,508 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:12,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:14,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:16,244 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:18,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:18,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:19,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:21,602 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:24,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:24,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:26,579 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:28,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:31,302 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:32,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:32,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:34,293 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:37,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:37,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:38,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:39,651 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:43,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:43,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:44,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:46,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:46,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:48,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:50,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:50,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:52,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:55,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:55,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.061, 'learning_rate': 8.670520231213871e-06, 'epoch': 4.9} +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:59,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 23:59:59,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:02,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:02,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:06,154 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:09,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:09,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.111, 'learning_rate': 8.497109826589595e-06, 'epoch': 4.9} +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:13,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:13,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:16,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:16,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:20,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:23,769 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:23,769 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0942, 'learning_rate': 8.323699421965318e-06, 'epoch': 4.9} +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:27,358 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:27,358 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:30,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:34,117 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:34,117 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:34,117 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:37,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:37,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:40,936 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:40,936 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:46,249 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:46,249 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:46,249 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:46,249 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:46,249 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0627, 'learning_rate': 7.976878612716762e-06, 'epoch': 4.91} +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:46,249 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:46,249 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:46,249 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:46,249 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:00:46,249 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0891, 'learning_rate': 7.803468208092485e-06, 'epoch': 4.91} + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0673, 'learning_rate': 7.630057803468207e-06, 'epoch': 4.91} + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.087, 'learning_rate': 7.45664739884393e-06, 'epoch': 4.91} + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0792, 'learning_rate': 7.283236994219652e-06, 'epoch': 4.91} + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0868, 'learning_rate': 7.109826589595374e-06, 'epoch': 4.92} + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0859, 'learning_rate': 6.9364161849710975e-06, 'epoch': 4.92} + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0631, 'learning_rate': 6.76300578034682e-06, 'epoch': 4.92} + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [7:28:30<08:38, 12.64s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:30:02<07:21, 12.97s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:30:02<07:21, 12.97s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0661, 'learning_rate': 6.589595375722542e-06, 'epoch': 4.92} + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:30:02<07:21, 12.97s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:30:02<07:21, 12.97s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:30:02<07:21, 12.97s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:30:02<07:21, 12.97s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▊ | 2197/2230 [7:30:14<07:06, 12.91s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▊ | 2197/2230 [7:30:14<07:06, 12.91s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0472, 'learning_rate': 6.4161849710982654e-06, 'epoch': 4.93} + 99%|███████████████████████████████████████████████████████████████████████████▊ | 2197/2230 [7:30:14<07:06, 12.91s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▊ | 2197/2230 [7:30:14<07:06, 12.91s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▊ | 2197/2230 [7:30:14<07:06, 12.91s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▊ | 2197/2230 [7:30:14<07:06, 12.91s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████��██████████████████████████████████████████████████████████▊ | 2197/2230 [7:30:14<07:06, 12.91s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▊ | 2197/2230 [7:30:14<07:06, 12.91s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▊ | 2197/2230 [7:30:14<07:06, 12.91s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0575, 'learning_rate': 6.242774566473988e-06, 'epoch': 4.93} + 99%|███████████████████████████████████████████████████████████████████████████▊ | 2197/2230 [7:30:14<07:06, 12.91s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▊ | 2197/2230 [7:30:14<07:06, 12.91s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▊ | 2197/2230 [7:30:14<07:06, 12.91s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▊ | 2197/2230 [7:30:14<07:06, 12.91s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▊ | 2197/2230 [7:30:14<07:06, 12.91s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▊ | 2197/2230 [7:30:14<07:06, 12.91s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0572, 'learning_rate': 6.06936416184971e-06, 'epoch': 4.93} + 99%|███████████████████████████████████████████████████████████████████████████▊ | 2197/2230 [7:30:14<07:06, 12.91s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▊ | 2197/2230 [7:30:14<07:06, 12.91s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▊ | 2197/2230 [7:30:14<07:06, 12.91s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▊ | 2197/2230 [7:30:14<07:06, 12.91s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▊ | 2197/2230 [7:30:14<07:06, 12.91s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▊ | 2197/2230 [7:30:14<07:06, 12.91s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▊ | 2197/2230 [7:30:14<07:06, 12.91s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0756, 'learning_rate': 5.895953757225433e-06, 'epoch': 4.93} + 99%|███████████████████████████████████████████████████████████████████████████▊ | 2197/2230 [7:30:14<07:06, 12.91s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▊ | 2197/2230 [7:30:14<07:06, 12.91s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▊ | 2197/2230 [7:30:14<07:06, 12.91s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▊ | 2197/2230 [7:30:14<07:06, 12.91s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▊ | 2197/2230 [7:30:14<07:06, 12.91s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▉ | 2201/2230 [7:31:07<06:21, 13.15s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▉ | 2201/2230 [7:31:07<06:21, 13.15s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▉ | 2201/2230 [7:31:07<06:21, 13.15s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▉ | 2201/2230 [7:31:07<06:21, 13.15s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▉ | 2201/2230 [7:31:07<06:21, 13.15s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▉ | 2201/2230 [7:31:07<06:21, 13.15s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████ | 2202/2230 [7:31:19<06:01, 12.92s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████ | 2202/2230 [7:31:19<06:01, 12.92s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0564, 'learning_rate': 5.549132947976878e-06, 'epoch': 4.94} + 99%|████████████████████████████████████████████████████████████████████████████ | 2202/2230 [7:31:19<06:01, 12.92s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████ | 2202/2230 [7:31:19<06:01, 12.92s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████ | 2202/2230 [7:31:19<06:01, 12.92s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████ | 2202/2230 [7:31:19<06:01, 12.92s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████ | 2203/2230 [7:31:32<05:44, 12.77s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████ | 2203/2230 [7:31:32<05:44, 12.77s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0701, 'learning_rate': 5.375722543352601e-06, 'epoch': 4.94} + 99%|████████████████████████████████████████████████████████████████████████████ | 2203/2230 [7:31:32<05:44, 12.77s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████ | 2203/2230 [7:31:32<05:44, 12.77s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████ | 2203/2230 [7:31:32<05:44, 12.77s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████ | 2203/2230 [7:31:32<05:44, 12.77s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████ | 2203/2230 [7:31:32<05:44, 12.77s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████ | 2203/2230 [7:31:32<05:44, 12.77s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0484, 'learning_rate': 5.202312138728323e-06, 'epoch': 4.94} + 99%|████████████████████████████████████████████████████████████████████████████ | 2203/2230 [7:31:32<05:44, 12.77s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████ | 2203/2230 [7:31:32<05:44, 12.77s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████ | 2203/2230 [7:31:32<05:44, 12.77s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████ | 2203/2230 [7:31:32<05:44, 12.77s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▏| 2205/2230 [7:31:56<05:11, 12.48s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▏| 2205/2230 [7:31:56<05:11, 12.48s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.068, 'learning_rate': 5.028901734104045e-06, 'epoch': 4.94} + 99%|█████████████████████████████████████████████���██████████████████████████████▏| 2205/2230 [7:31:56<05:11, 12.48s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▏| 2205/2230 [7:31:56<05:11, 12.48s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▏| 2205/2230 [7:31:56<05:11, 12.48s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▏| 2205/2230 [7:31:56<05:11, 12.48s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:32:08<04:56, 12.35s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:32:08<04:56, 12.35s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0546, 'learning_rate': 4.855491329479768e-06, 'epoch': 4.95} + 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:32:08<04:56, 12.35s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:32:08<04:56, 12.35s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:32:08<04:56, 12.35s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:32:08<04:56, 12.35s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:32:08<04:56, 12.35s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0635, 'learning_rate': 4.682080924855491e-06, 'epoch': 4.95} + 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:32:08<04:56, 12.35s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███���████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:32:08<04:56, 12.35s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:32:08<04:56, 12.35s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:32:08<04:56, 12.35s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:32:08<04:56, 12.35s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:32:08<04:56, 12.35s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0693, 'learning_rate': 4.508670520231213e-06, 'epoch': 4.95} + 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:32:08<04:56, 12.35s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:32:08<04:56, 12.35s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:32:08<04:56, 12.35s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:32:08<04:56, 12.35s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:32:08<04:56, 12.35s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:32:08<04:56, 12.35s/it]g-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0459, 'learning_rate': 4.335260115606936e-06, 'epoch': 4.95} +[WARNING|modeling_bart.py:1051] 2022-03-23 00:05:24,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-23 00:05:24,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-23 00:05:24,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-23 00:05:24,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▎| 2210/2230 [7:32:56<03:58, 11.94s/it] Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▎| 2210/2230 [7:32:56<03:58, 11.94s/it] Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0561, 'learning_rate': 4.161849710982659e-06, 'epoch': 4.96} + 99%|████████████████████████████████████████████████████████████████████████████▎| 2210/2230 [7:32:56<03:58, 11.94s/it] Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▎| 2210/2230 [7:32:56<03:58, 11.94s/it] Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▎| 2210/2230 [7:32:56<03:58, 11.94s/it] Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▎| 2210/2230 [7:32:56<03:58, 11.94s/it] Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▎| 2210/2230 [7:32:56<03:58, 11.94s/it] Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▎| 2210/2230 [7:32:56<03:58, 11.94s/it] Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0614, 'learning_rate': 3.988439306358381e-06, 'epoch': 4.96} + 99%|████████████████████████████████████████████████████████████████████████████▎| 2210/2230 [7:32:56<03:58, 11.94s/it] Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▎| 2210/2230 [7:32:56<03:58, 11.94s/it] Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▎| 2210/2230 [7:32:56<03:58, 11.94s/it] Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:05:55,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:05:55,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0694, 'learning_rate': 3.8150289017341036e-06, 'epoch': 4.96} +[WARNING|modeling_utils.py:388] 2022-03-23 00:05:55,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:05:55,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:05:55,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:06:06,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:06:06,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:06:06,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:06:06,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0504, 'learning_rate': 3.641618497109826e-06, 'epoch': 4.96} +[WARNING|modeling_utils.py:388] 2022-03-23 00:06:06,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:06:06,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:06:06,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:06:06,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:06:06,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-23 00:06:22,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-23 00:06:22,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-23 00:06:22,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-23 00:06:22,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-23 00:06:22,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-23 00:06:22,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.064, 'learning_rate': 3.294797687861271e-06, 'epoch': 4.97} +[WARNING|modeling_bart.py:1051] 2022-03-23 00:06:22,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-23 00:06:22,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-23 00:06:22,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:06:40,483 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:06:40,483 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0734, 'learning_rate': 3.121387283236994e-06, 'epoch': 4.97} +[WARNING|modeling_utils.py:388] 2022-03-23 00:06:40,483 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:06:40,483 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:06:40,483 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:06:51,035 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:06:51,035 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0597, 'learning_rate': 2.9479768786127167e-06, 'epoch': 4.97} +[WARNING|modeling_utils.py:388] 2022-03-23 00:06:54,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:06:54,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:06:54,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-23 00:07:00,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-23 00:07:00,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▌| 2218/2230 [7:34:26<02:10, 10.84s/it] Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:05,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:05,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:05,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:11,357 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:11,357 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0282, 'learning_rate': 2.6011560693641614e-06, 'epoch': 4.98} +[WARNING|modeling_bart.py:1051] 2022-03-23 00:07:15,810 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-23 00:07:15,810 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:19,822 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:19,822 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:19,822 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0546, 'learning_rate': 2.427745664739884e-06, 'epoch': 4.98} +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:25,712 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:27,976 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:30,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:30,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:30,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-23 00:07:34,214 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-23 00:07:34,214 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:37,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:39,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:39,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:41,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:43,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:45,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:47,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:47,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:49,439 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:51,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:53,005 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:53,005 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:56,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:58,080 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:59,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:59,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:07:59,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:08:04,364 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:08:05,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:08:08,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:08:08,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:08:09,896 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:08:12,332 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:08:13,490 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:08:13,490 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:08:15,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:08:17,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:08:17,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:08:19,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:08:21,647 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:08:21,647 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:08:24,137 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-23 00:08:24,137 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1199, 'learning_rate': 6.936416184971098e-07, 'epoch': 5.0} +[INFO|configuration_utils.py:438] 2022-03-23 00:08:24,862 >> Configuration saved in ./config.jsons of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|configuration_utils.py:438] 2022-03-23 00:08:36,626 >> Configuration saved in ./config.jsons of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|configuration_utils.py:438] 2022-03-23 00:08:36,626 >> Configuration saved in ./config.jsons of the input, floating-point operations will not be computed-22 23:49:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...