diff --git "a/wandb/run-20220325_193848-1sz5964i/files/output.log" "b/wandb/run-20220325_193848-1sz5964i/files/output.log" --- "a/wandb/run-20220325_193848-1sz5964i/files/output.log" +++ "b/wandb/run-20220325_193848-1sz5964i/files/output.log" @@ -6268,3 +6268,6174 @@ [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +03/25/2022 22:54:14 - WARNING - huggingface_hub.repository - Adding files tracked by Git LFS: ['wandb/run-20220325_193848-1sz5964i/run-1sz5964i.wandb']. This may take a bit of time if the files are large. +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1344, 'learning_rate': 0.0002982, 'epoch': 2.25} +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2398, 'learning_rate': 0.0002988, 'epoch': 2.25} +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0204, 'learning_rate': 0.00029939999999999996, 'epoch': 2.26} +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.032, 'learning_rate': 0.0003, 'epoch': 2.26} + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9013, 'learning_rate': 0.0002995121951219512, 'epoch': 2.26} + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7927, 'learning_rate': 0.0002990243902439024, 'epoch': 2.27} + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7671, 'learning_rate': 0.0002985365853658536, 'epoch': 2.27} + 45%|██████████████████���██████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7971, 'learning_rate': 0.00029804878048780484, 'epoch': 2.28} + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████���███████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7401, 'learning_rate': 0.00029756097560975606, 'epoch': 2.28} + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5998, 'learning_rate': 0.0002970731707317073, 'epoch': 2.29} + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.6632, 'learning_rate': 0.0002965853658536585, 'epoch': 2.29} + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.6384, 'learning_rate': 0.0002960975609756097, 'epoch': 2.3} + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|█████████████████████████████████��▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.6145, 'learning_rate': 0.0002956097560975609, 'epoch': 2.3} + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5137, 'learning_rate': 0.0002951219512195122, 'epoch': 2.3} + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5113, 'learning_rate': 0.00029463414634146336, 'epoch': 2.31} + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4298, 'learning_rate': 0.0002941463414634146, 'epoch': 2.31} + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3688, 'learning_rate': 0.00029365853658536585, 'epoch': 2.32} + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|████████████████████████████████���█▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3792, 'learning_rate': 0.00029317073170731706, 'epoch': 2.32} + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|█████████���████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4692, 'learning_rate': 0.0002926829268292683, 'epoch': 2.33} + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3661, 'learning_rate': 0.0002921951219512195, 'epoch': 2.33} + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|███████████████████████████████████▌ | 521/1115 [3:24:58<4:03:57, 24.64s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|███████████████████████████████████▌ | 521/1115 [3:24:58<4:03:57, 24.64s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4411, 'learning_rate': 0.0002917073170731707, 'epoch': 2.34} +[WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.509, 'learning_rate': 0.00029121951219512193, 'epoch': 2.34} +[WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|███████████████████████████████████▋ | 523/1115 [3:25:46<3:58:09, 24.14s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|███████████████████████████████████▋ | 523/1115 [3:25:46<3:58:09, 24.14s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3465, 'learning_rate': 0.00029073170731707315, 'epoch': 2.35} +[WARNING|modeling_utils.py:388] 2022-03-25 23:04:39,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:04:39,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:04:39,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:04:39,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:04:39,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:04:39,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:04:39,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:04:39,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:04:39,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:04:39,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:04:39,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:04:39,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2509, 'learning_rate': 0.00029024390243902437, 'epoch': 2.35} +[WARNING|modeling_utils.py:388] 2022-03-25 23:04:39,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:05:06,120 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:05:06,120 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:05:06,120 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:05:06,120 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:05:06,120 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:05:06,120 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:05:06,120 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:05:20,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:05:20,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:05:20,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.1902, 'learning_rate': 0.0002897560975609756, 'epoch': 2.35} +[WARNING|modeling_utils.py:388] 2022-03-25 23:05:20,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:05:20,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:05:20,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:05:20,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:35,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:35,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:35,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:35,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:35,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:35,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:35,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2894, 'learning_rate': 0.0002892682926829268, 'epoch': 2.36} +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2102, 'learning_rate': 0.000288780487804878, 'epoch': 2.36} +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.9926, 'learning_rate': 0.00028829268292682923, 'epoch': 2.37} +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.1141, 'learning_rate': 0.00028780487804878045, 'epoch': 2.37} +[WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 48%|████████████████████████████████████▏ | 530/1115 [3:28:26<3:40:44, 22.64s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 48%|████████████████████████████████████▏ | 530/1115 [3:28:26<3:40:44, 22.64s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.9693, 'learning_rate': 0.00028731707317073167, 'epoch': 2.38} + 48%|████████████████████████████████████▏ | 530/1115 [3:28:26<3:40:44, 22.64s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 48%|████████████████████████████████████▏ | 530/1115 [3:28:26<3:40:44, 22.64s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 48%|████████████████████████████████████▏ | 530/1115 [3:28:26<3:40:44, 22.64s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 48%|████████████████████████████████████▏ | 530/1115 [3:28:26<3:40:44, 22.64s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:07:28,707 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:07:28,707 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:07:28,707 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:07:28,707 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:07:28,707 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:07:28,707 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:07:38,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:07:38,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:07:38,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:07:38,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:07:38,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:07:38,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:07:38,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:07:38,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:07:54,934 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:07:54,934 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 48%|████████████████████████████████████▎ | 532/1115 [3:29:09<3:34:24, 22.07s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 48%|████████████████████████████████████▎ | 532/1115 [3:29:09<3:34:24, 22.07s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.0293, 'learning_rate': 0.0002863414634146341, 'epoch': 2.39} + 48%|████████████████████████████████████▎ | 532/1115 [3:29:09<3:34:24, 22.07s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:08:05,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:08:05,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:08:05,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:08:05,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:08:13,027 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:08:13,027 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:08:13,027 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:08:19,285 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:08:19,285 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.9632, 'learning_rate': 0.00028585365853658537, 'epoch': 2.39} +[WARNING|modeling_utils.py:388] 2022-03-25 23:08:19,285 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:08:25,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:08:25,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:08:25,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:08:31,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:08:31,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:08:31,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:08:37,874 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:08:37,874 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.8581, 'learning_rate': 0.00028536585365853654, 'epoch': 2.39} +[WARNING|modeling_bart.py:1051] 2022-03-25 23:08:42,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:08:42,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:08:46,358 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:08:46,358 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:08:46,358 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:08:52,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:08:52,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:08:56,518 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 48%|████████████████████████████████████▍ | 535/1115 [3:30:08<3:17:39, 20.45s/it] Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 48%|████████████████████████████████████▍ | 535/1115 [3:30:08<3:17:39, 20.45s/it] Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:09:00,551 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:09:00,551 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:09:04,679 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:09:06,934 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:09:06,934 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:09:10,753 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:09:12,966 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:09:12,966 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 48%|████████████████████████████████████▌ | 536/1115 [3:30:27<3:10:42, 19.76s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 48%|████████████████████████████████████▌ | 536/1115 [3:30:27<3:10:42, 19.76s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:09:19,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:09:19,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:09:22,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:09:22,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:09:26,785 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:09:28,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:09:30,967 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:09:33,035 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:09:33,035 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:09:35,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:09:37,264 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:09:39,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:09:41,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:09:41,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:09:45,606 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:09:47,554 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:09:49,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:09:51,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:09:51,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:09:53,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:09:55,401 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:09:57,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:09:59,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:00,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:02,786 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:04,594 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:04,594 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:06,480 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:10,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:11,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:13,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:15,141 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:16,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:16,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:20,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:21,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:23,330 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:24,878 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:27,955 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:29,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:29,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:32,448 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:33,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:36,658 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:38,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:40,681 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:41,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:41,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:44,586 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:45,794 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:48,152 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:50,418 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:50,418 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:52,725 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:54,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:56,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:10:57,810 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:00,560 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:00,560 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:02,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:04,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:06,959 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:06,959 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:08,729 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:10,403 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:11,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:14,139 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:14,139 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:16,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:16,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:20,092 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:20,092 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:23,772 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:23,772 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:27,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:27,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:31,046 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:34,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:34,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:38,251 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:38,251 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:41,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:41,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:45,444 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:45,444 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:49,010 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:49,010 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:52,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:52,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:56,044 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:59,513 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:11:59,513 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:03,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:03,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:06,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:06,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:10,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:10,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:13,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:13,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:17,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:17,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:20,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:24,150 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:24,150 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:27,618 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:27,618 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:31,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:34,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:34,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:38,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:38,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:38,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:41,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:44,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:44,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:48,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:48,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:51,792 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:51,792 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:55,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:58,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:12:58,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:01,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.93, 'learning_rate': 0.00027756097560975606, 'epoch': 2.47} +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.6004, 'learning_rate': 0.0002770731707317073, 'epoch': 2.47} +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4879, 'learning_rate': 0.00027658536585365855, 'epoch': 2.48} +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3519, 'learning_rate': 0.0002760975609756097, 'epoch': 2.48} +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2981, 'learning_rate': 0.00027560975609756093, 'epoch': 2.48} +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.1892, 'learning_rate': 0.0002751219512195122, 'epoch': 2.49} +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.1885, 'learning_rate': 0.00027463414634146336, 'epoch': 2.49} +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.9167, 'learning_rate': 0.00027414634146341463, 'epoch': 2.5} +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.8781, 'learning_rate': 0.00027365853658536585, 'epoch': 2.5} +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.8923, 'learning_rate': 0.000273170731707317, 'epoch': 2.51} +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.6902, 'learning_rate': 0.0002726829268292683, 'epoch': 2.51} +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.6606, 'learning_rate': 0.0002721951219512195, 'epoch': 2.52} +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.5854, 'learning_rate': 0.0002717073170731707, 'epoch': 2.52} +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.4889, 'learning_rate': 0.00027121951219512193, 'epoch': 2.52} +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.3526, 'learning_rate': 0.00027073170731707315, 'epoch': 2.53} +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 565/1115 [3:41:03<3:56:04, 25.75s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 565/1115 [3:41:03<3:56:04, 25.75s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.3056, 'learning_rate': 0.00027024390243902437, 'epoch': 2.53} + 51%|██████████████████████████████████████▌ | 565/1115 [3:41:03<3:56:04, 25.75s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 565/1115 [3:41:03<3:56:04, 25.75s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 565/1115 [3:41:03<3:56:04, 25.75s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 565/1115 [3:41:03<3:56:04, 25.75s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 565/1115 [3:41:03<3:56:04, 25.75s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 565/1115 [3:41:03<3:56:04, 25.75s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 565/1115 [3:41:03<3:56:04, 25.75s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 565/1115 [3:41:03<3:56:04, 25.75s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 565/1115 [3:41:03<3:56:04, 25.75s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 565/1115 [3:41:03<3:56:04, 25.75s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 566/1115 [3:41:28<3:53:07, 25.48s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 566/1115 [3:41:28<3:53:07, 25.48s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2212, 'learning_rate': 0.0002697560975609756, 'epoch': 2.54} + 51%|██████████████████████████████████████▌ | 566/1115 [3:41:28<3:53:07, 25.48s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 566/1115 [3:41:28<3:53:07, 25.48s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 566/1115 [3:41:28<3:53:07, 25.48s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 566/1115 [3:41:28<3:53:07, 25.48s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 566/1115 [3:41:28<3:53:07, 25.48s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 566/1115 [3:41:28<3:53:07, 25.48s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 566/1115 [3:41:28<3:53:07, 25.48s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 566/1115 [3:41:28<3:53:07, 25.48s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 566/1115 [3:41:28<3:53:07, 25.48s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 566/1115 [3:41:28<3:53:07, 25.48s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1994, 'learning_rate': 0.0002692682926829268, 'epoch': 2.54} + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|███���██████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1812, 'learning_rate': 0.000268780487804878, 'epoch': 2.55} + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.058, 'learning_rate': 0.00026829268292682924, 'epoch': 2.55} + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9894, 'learning_rate': 0.00026780487804878045, 'epoch': 2.56} + g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0365, 'learning_rate': 0.0002673170731707317, 'epoch': 2.56} + g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9143, 'learning_rate': 0.0002668292682926829, 'epoch': 2.57} +[WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.7815, 'learning_rate': 0.0002663414634146341, 'epoch': 2.57} +[WARNING|modeling_bart.py:1051] 2022-03-25 23:23:13,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:23:13,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:23:13,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:23:13,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:23:13,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:23:13,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:23:13,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:23:13,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:23:13,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|███████████████████████████████████████ | 574/1115 [3:44:42<3:35:16, 23.88s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|███████████████████████████████████████ | 574/1115 [3:44:42<3:35:16, 23.88s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.728, 'learning_rate': 0.0002658536585365854, 'epoch': 2.57} + 51%|███████████████████████████████████████ | 574/1115 [3:44:42<3:35:16, 23.88s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|███████████████████████████████████████ | 574/1115 [3:44:42<3:35:16, 23.88s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|███████████████████████████████████████ | 574/1115 [3:44:42<3:35:16, 23.88s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|███████████████████████████████████████ | 574/1115 [3:44:42<3:35:16, 23.88s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|███████████████████████���███████████████ | 574/1115 [3:44:42<3:35:16, 23.88s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|███████████████████████████████████████ | 574/1115 [3:44:42<3:35:16, 23.88s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|███████████████████████████████████████ | 574/1115 [3:44:42<3:35:16, 23.88s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:23:49,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:23:49,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:23:49,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:23:49,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.7936, 'learning_rate': 0.00026536585365853654, 'epoch': 2.58} +[WARNING|modeling_utils.py:388] 2022-03-25 23:23:49,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:23:49,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:23:49,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:23:49,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:24:06,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:24:06,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:24:06,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:24:06,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:24:06,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:24:06,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:24:06,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:24:06,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.7649, 'learning_rate': 0.0002648780487804878, 'epoch': 2.58} +[WARNING|modeling_bart.py:1051] 2022-03-25 23:24:06,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:24:06,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:24:26,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:24:26,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:24:26,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:24:26,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:24:26,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:24:26,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:24:26,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▎ | 577/1115 [3:45:51<3:29:43, 23.39s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▎ | 577/1115 [3:45:51<3:29:43, 23.39s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.6668, 'learning_rate': 0.000264390243902439, 'epoch': 2.59} + 52%|███████████████████████████████████████▎ | 577/1115 [3:45:51<3:29:43, 23.39s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▎ | 577/1115 [3:45:51<3:29:43, 23.39s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:24:49,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:24:49,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:24:49,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:24:49,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:24:49,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:24:49,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:24:49,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:24:49,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:24:49,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.5596, 'learning_rate': 0.0002639024390243902, 'epoch': 2.59} +[WARNING|modeling_utils.py:388] 2022-03-25 23:25:07,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:25:07,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:25:11,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:25:11,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:25:11,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:25:11,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:25:20,186 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:25:20,186 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:25:24,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:25:24,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:25:24,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:25:28,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:25:28,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:25:28,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:25:28,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:25:28,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:25:28,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:25:28,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:25:42,937 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:25:42,937 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▌ | 580/1115 [3:46:57<3:19:19, 22.35s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▌ | 580/1115 [3:46:57<3:19:19, 22.35s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.5524, 'learning_rate': 0.0002629268292682927, 'epoch': 2.6} + 52%|███████████████████████████████████████▌ | 580/1115 [3:46:57<3:19:19, 22.35s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▌ | 580/1115 [3:46:57<3:19:19, 22.35s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▌ | 580/1115 [3:46:57<3:19:19, 22.35s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▌ | 580/1115 [3:46:57<3:19:19, 22.35s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▌ | 580/1115 [3:46:57<3:19:19, 22.35s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▌ | 580/1115 [3:46:57<3:19:19, 22.35s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:02,853 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:05,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:05,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:05,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.426, 'learning_rate': 0.0002624390243902439, 'epoch': 2.61} +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:05,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:05,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:05,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:05,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:05,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:05,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:05,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:05,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:05,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:29,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:29,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.4211, 'learning_rate': 0.0002619512195121951, 'epoch': 2.61} +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:29,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:29,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:29,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:40,051 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:40,051 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:40,051 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:46,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:46,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▋ | 583/1115 [3:48:00<3:09:58, 21.43s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▋ | 583/1115 [3:48:00<3:09:58, 21.43s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.4552, 'learning_rate': 0.0002614634146341463, 'epoch': 2.61} +[WARNING|modeling_bart.py:1051] 2022-03-25 23:26:54,705 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:26:54,705 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:58,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:58,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:58,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:58,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:26:58,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:27:08,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:27:08,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:27:08,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.3657, 'learning_rate': 0.00026097560975609754, 'epoch': 2.62} +[WARNING|modeling_utils.py:388] 2022-03-25 23:27:14,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:27:14,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:27:14,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:27:20,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:27:20,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:27:25,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:27:27,422 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 585/1115 [3:48:39<3:00:11, 20.40s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 585/1115 [3:48:39<3:00:11, 20.40s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:27:31,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:27:31,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:27:34,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:27:37,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:27:39,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:27:39,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:27:43,565 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:27:43,565 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:27:47,295 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:27:47,295 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:27:49,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:27:51,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:27:53,911 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:27:53,911 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:27:57,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:27:59,856 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:28:01,935 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:28:04,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:28:04,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:28:06,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:28:08,211 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:28:10,197 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:28:12,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:28:12,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:28:16,346 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:28:18,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:28:20,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:28:20,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:28:22,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:28:24,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:28:25,936 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:28:27,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:28:29,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:28:31,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:28:33,131 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:28:33,131 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:28:36,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:28:38,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:28:40,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:28:41,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:28:43,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:28:46,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:28:48,329 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:28:48,329 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:28:50,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:28:53,141 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:28:54,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:28:56,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:28:59,136 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:00,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:00,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:02,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:04,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:07,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:08,843 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:11,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:11,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:12,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:15,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:16,507 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:18,878 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:21,182 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:21,182 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:23,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:25,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:27,782 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:29,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:31,633 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:31,633 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:33,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:35,280 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:37,936 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:37,936 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:39,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:41,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:43,672 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:44,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:44,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:46,895 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:46,895 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:50,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:50,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:54,263 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:57,905 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:29:57,905 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:01,528 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:01,528 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:05,130 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:05,130 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:08,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:12,250 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:12,250 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:12,250 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:15,905 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:15,905 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:19,418 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:19,418 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:22,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:26,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:26,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:30,005 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:30,005 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:33,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:33,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:36,978 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:40,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:40,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:40,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:44,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:44,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:47,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:51,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:51,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:54,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:54,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:30:58,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:01,427 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:01,427 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:04,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:04,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:08,437 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:08,437 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:12,024 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:12,024 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:15,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:15,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:22,278 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:25,721 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:25,721 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:29,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:29,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:32,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.6587, 'learning_rate': 0.00025317073170731707, 'epoch': 2.69} +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.0331, 'learning_rate': 0.0002526829268292683, 'epoch': 2.7} +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.6576, 'learning_rate': 0.0002521951219512195, 'epoch': 2.7} +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.4141, 'learning_rate': 0.0002517073170731707, 'epoch': 2.7} +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2096, 'learning_rate': 0.00025121951219512194, 'epoch': 2.71} +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0265, 'learning_rate': 0.00025073170731707315, 'epoch': 2.71} +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9003, 'learning_rate': 0.00025024390243902437, 'epoch': 2.72} +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.7156, 'learning_rate': 0.0002497560975609756, 'epoch': 2.72} +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.6201, 'learning_rate': 0.0002492682926829268, 'epoch': 2.73} +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.4503, 'learning_rate': 0.000248780487804878, 'epoch': 2.73} +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.4161, 'learning_rate': 0.00024829268292682924, 'epoch': 2.74} +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.2728, 'learning_rate': 0.00024780487804878045, 'epoch': 2.74} +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.2627, 'learning_rate': 0.00024731707317073167, 'epoch': 2.74} +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.301, 'learning_rate': 0.0002468292682926829, 'epoch': 2.75} +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.1987, 'learning_rate': 0.00024634146341463416, 'epoch': 2.75} +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.0851, 'learning_rate': 0.0002458536585365853, 'epoch': 2.76} +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|█████████████████████████████████████████▉ | 616/1115 [3:59:48<3:31:16, 25.40s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|█████████████████████████████████████████▉ | 616/1115 [3:59:48<3:31:16, 25.40s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.207, 'learning_rate': 0.00024536585365853654, 'epoch': 2.76} + 55%|█████████████████████████████████████████▉ | 616/1115 [3:59:48<3:31:16, 25.40s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|█████████████████████████████████████████▉ | 616/1115 [3:59:48<3:31:16, 25.40s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|█████████████████████████████████████████▉ | 616/1115 [3:59:48<3:31:16, 25.40s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|█████████████████████████████████████████▉ | 616/1115 [3:59:48<3:31:16, 25.40s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|█████████████████████████████████████████▉ | 616/1115 [3:59:48<3:31:16, 25.40s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|█████████████████████████████████████████▉ | 616/1115 [3:59:48<3:31:16, 25.40s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|█████████████████████████████████████████▉ | 616/1115 [3:59:48<3:31:16, 25.40s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|█████████████████████████████████████████▉ | 616/1115 [3:59:48<3:31:16, 25.40s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|█████████████████████████████████████████▉ | 616/1115 [3:59:48<3:31:16, 25.40s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|█████████████████████████████████████████▉ | 616/1115 [3:59:48<3:31:16, 25.40s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.0918, 'learning_rate': 0.0002448780487804878, 'epoch': 2.77} + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.1469, 'learning_rate': 0.00024439024390243897, 'epoch': 2.77} + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████���███████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.9882, 'learning_rate': 0.00024390243902439022, 'epoch': 2.78} + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.0057, 'learning_rate': 0.00024341463414634146, 'epoch': 2.78} + 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.9869, 'learning_rate': 0.00024292682926829268, 'epoch': 2.78} + 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:40:49,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:40:49,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:40:49,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:40:49,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:40:49,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.9118, 'learning_rate': 0.00024243902439024387, 'epoch': 2.79} +[WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 623/1115 [4:02:38<3:17:53, 24.13s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 623/1115 [4:02:38<3:17:53, 24.13s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.8895, 'learning_rate': 0.0002419512195121951, 'epoch': 2.79} +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.7842, 'learning_rate': 0.00024146341463414633, 'epoch': 2.8} +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.8535, 'learning_rate': 0.00024097560975609755, 'epoch': 2.8} +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.9101, 'learning_rate': 0.00024048780487804876, 'epoch': 2.81} +[WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.9084, 'learning_rate': 0.00023999999999999998, 'epoch': 2.81} +[WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:43:08,208 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:43:08,208 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.8409, 'learning_rate': 0.0002395121951219512, 'epoch': 2.82} +[WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.8037, 'learning_rate': 0.0002390243902439024, 'epoch': 2.82} +[WARNING|modeling_bart.py:1051] 2022-03-25 23:43:49,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:43:49,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:43:49,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:43:49,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:43:49,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:43:59,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:43:59,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:43:59,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:43:59,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.7994, 'learning_rate': 0.00023853658536585366, 'epoch': 2.83} + 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.8201, 'learning_rate': 0.00023804878048780485, 'epoch': 2.83} + 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:44:46,489 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:44:46,489 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████ | 632/1115 [4:06:01<2:56:52, 21.97s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████ | 632/1115 [4:06:01<2:56:52, 21.97s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.7229, 'learning_rate': 0.00023756097560975606, 'epoch': 2.83} + 57%|███████████████████████████████████████████ | 632/1115 [4:06:01<2:56:52, 21.97s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████ | 632/1115 [4:06:01<2:56:52, 21.97s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████ | 632/1115 [4:06:01<2:56:52, 21.97s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:45:00,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:45:00,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:45:00,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:45:07,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:45:07,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.7606, 'learning_rate': 0.0002370731707317073, 'epoch': 2.84} + g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:45:17,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:45:17,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:45:17,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:45:23,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:45:23,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:45:23,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:45:29,326 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:45:29,326 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:45:29,326 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:45:33,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:45:33,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:45:33,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:45:39,894 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:45:39,894 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:45:43,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:45:43,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:45:48,182 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 635/1115 [4:07:00<2:43:43, 20.47s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 635/1115 [4:07:00<2:43:43, 20.47s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:45:52,213 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:45:52,213 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:45:56,445 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:45:56,445 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:46:00,330 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:46:02,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:46:02,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:46:06,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 636/1115 [4:07:19<2:38:39, 19.87s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 636/1115 [4:07:19<2:38:39, 19.87s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:46:10,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:46:12,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:46:12,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:46:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:46:19,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:46:21,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:46:23,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:46:25,395 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:46:25,395 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.7164, 'learning_rate': 0.00023512195121951215, 'epoch': 2.86} +[WARNING|modeling_utils.py:388] 2022-03-25 23:46:28,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:46:31,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:46:33,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:46:33,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:46:37,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:46:39,500 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:46:41,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▍ | 638/1115 [4:07:53<2:27:15, 18.52s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:46:43,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▍ | 638/1115 [4:07:53<2:27:15, 18.52s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:46:43,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:46:45,517 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:46:43,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:46:47,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:46:43,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:46:49,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:46:43,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:46:51,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:46:43,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:46:52,980 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:46:43,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:46:54,824 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:46:43,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▌ | 639/1115 [4:08:08<2:18:42, 17.48s/it] Setting `use_cache=False`...1] 2022-03-25 23:46:43,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▌ | 639/1115 [4:08:08<2:18:42, 17.48s/it] Setting `use_cache=False`...1] 2022-03-25 23:46:43,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▌ | 639/1115 [4:08:08<2:18:42, 17.48s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:46:58,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:02,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:46:58,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:03,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:46:58,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:05,412 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:46:58,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:07,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:46:58,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:10,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:46:58,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|████████████████████████████████��██████████▌ | 640/1115 [4:08:22<2:09:10, 16.32s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:47:12,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▌ | 640/1115 [4:08:22<2:09:10, 16.32s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:47:12,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:13,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:12,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:15,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:12,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:18,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:12,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:19,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:12,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:21,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:12,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:21,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:12,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▋ | 641/1115 [4:08:34<1:59:51, 15.17s/it] Setting `use_cache=False`...1] 2022-03-25 23:47:12,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:25,989 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:24,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:27,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:24,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:30,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:24,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:31,568 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:24,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:34,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:24,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:34,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:24,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|███████████████████████████████████████████▊ | 642/1115 [4:08:45<1:50:07, 13.97s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:47:35,616 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:38,128 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:35,616 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:40,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:35,616 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:41,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:35,616 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:44,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:35,616 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:46,428 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:45,310 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:46,428 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:45,310 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:48,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:45,310 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:50,718 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:45,310 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:52,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:45,310 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|███████████████████████████████████████████▉ | 644/1115 [4:09:05<1:31:54, 11.71s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:47:54,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|███████████████████████████████████████████▉ | 644/1115 [4:09:05<1:31:54, 11.71s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:47:54,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:56,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:54,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:47:58,395 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:54,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:48:00,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:54,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:48:02,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:02,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:48:02,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:02,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:48:04,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:02,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:48:06,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:02,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████ | 646/1115 [4:09:18<1:12:09, 9.23s/it] Setting `use_cache=False`...1] 2022-03-25 23:48:02,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████ | 646/1115 [4:09:18<1:12:09, 9.23s/it] Setting `use_cache=False`...1] 2022-03-25 23:48:02,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████ | 646/1115 [4:09:18<1:12:09, 9.23s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████ | 646/1115 [4:09:18<1:12:09, 9.23s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:48:13,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:48:16,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:48:16,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:48:20,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:48:20,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:48:24,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:48:24,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:48:27,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:48:31,116 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:48:31,116 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:48:34,666 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████ | 647/1115 [4:09:47<1:58:10, 15.15s/it] Setting `use_cache=False`...1] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████ | 647/1115 [4:09:47<1:58:10, 15.15s/it] Setting `use_cache=False`...1] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████ | 647/1115 [4:09:47<1:58:10, 15.15s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████ | 647/1115 [4:09:47<1:58:10, 15.15s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:48:41,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:48:45,290 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:48:45,290 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:48:48,715 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:48:48,715 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:48:52,158 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:48:55,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:48:55,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:48:59,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:48:59,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:02,463 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:02,463 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████▏ | 648/1115 [4:10:15<2:27:09, 18.91s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████▏ | 648/1115 [4:10:15<2:27:09, 18.91s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:09,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:09,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:12,822 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:12,822 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:16,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:19,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:19,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:23,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:26,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:26,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:29,774 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████▏ | 649/1115 [4:10:42<2:46:22, 21.42s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████▏ | 649/1115 [4:10:42<2:46:22, 21.42s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:36,608 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:39,974 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:39,974 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:43,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:46,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:46,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:50,129 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:50,129 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:53,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.1011, 'learning_rate': 0.00022878048780487802, 'epoch': 2.91} +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1706, 'learning_rate': 0.00022829268292682924, 'epoch': 2.92} +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8403, 'learning_rate': 0.00022780487804878048, 'epoch': 2.92} +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.7287, 'learning_rate': 0.00022731707317073167, 'epoch': 2.93} + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|███████████████████��████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.5158, 'learning_rate': 0.00022682926829268292, 'epoch': 2.93} + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.2994, 'learning_rate': 0.00022634146341463413, 'epoch': 2.94} + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.099, 'learning_rate': 0.00022585365853658532, 'epoch': 2.94} + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:52:43,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:52:43,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:52:43,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:52:43,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:52:43,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:52:43,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:52:43,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:52:43,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.9544, 'learning_rate': 0.00022536585365853657, 'epoch': 2.95} + 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.97, 'learning_rate': 0.00022487804878048778, 'epoch': 2.95} + 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.9019, 'learning_rate': 0.000224390243902439, 'epoch': 2.96} + 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.873, 'learning_rate': 0.00022390243902439022, 'epoch': 2.96} + 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:54:17,899 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:54:17,899 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:54:17,899 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:54:24,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:54:24,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:54:24,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:54:24,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:54:24,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:54:24,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.8081, 'learning_rate': 0.00022341463414634146, 'epoch': 2.96} +[WARNING|modeling_utils.py:388] 2022-03-25 23:54:24,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:54:24,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:54:24,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:54:42,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:54:42,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:54:46,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:54:46,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:54:46,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:54:46,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|█████████████████████████████████████████████ | 662/1115 [4:16:05<2:54:27, 23.11s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|█████████████████████████████████████████████ | 662/1115 [4:16:05<2:54:27, 23.11s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.8081, 'learning_rate': 0.00022292682926829265, 'epoch': 2.97} + 59%|█████████████████████████████████████████████ | 662/1115 [4:16:05<2:54:27, 23.11s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|█████████████████████████████████████████████ | 662/1115 [4:16:05<2:54:27, 23.11s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|█████████████████████████████████████████████ | 662/1115 [4:16:05<2:54:27, 23.11s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|█████████████████████████████████████████████ | 662/1115 [4:16:05<2:54:27, 23.11s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|█████████████████████████████████████████████ | 662/1115 [4:16:05<2:54:27, 23.11s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:55:08,439 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:55:08,439 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:55:08,439 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:55:08,439 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|█████████████████████████████████████████████▏ | 663/1115 [4:16:26<2:51:13, 22.73s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|█████████████████████████████████████████████▏ | 663/1115 [4:16:26<2:51:13, 22.73s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:55:18,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:55:18,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:55:18,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:55:18,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:55:27,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:55:27,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:55:27,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:55:33,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:55:33,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:55:33,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:55:37,278 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:55:37,278 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:55:41,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:55:41,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:55:45,339 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:55:47,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:55:49,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:55:51,953 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:55:51,953 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:55:51,953 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:55:55,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 23:55:55,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:55:59,410 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:01,363 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:03,288 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:05,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:07,067 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:08,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:08,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:10,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:12,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:15,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:17,392 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:18,937 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:21,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:21,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:23,249 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:25,801 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:28,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:30,361 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:31,409 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:31,409 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:34,357 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:36,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:38,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:39,198 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:39,198 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:41,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:41,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:45,463 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:45,463 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:49,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:49,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:52,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:56,277 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:56,277 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:59,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:56:59,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:03,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:03,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:06,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:06,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:10,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:10,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:14,171 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:14,171 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:17,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:21,189 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:21,189 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2184, 'learning_rate': 0.00021853658536585366, 'epoch': 3.01} +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9385, 'learning_rate': 0.00021804878048780485, 'epoch': 3.01} +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.2234, 'learning_rate': 0.0002175609756097561, 'epoch': 3.02} +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.2986, 'learning_rate': 0.0002170731707317073, 'epoch': 3.02} +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.1385, 'learning_rate': 0.0002165853658536585, 'epoch': 3.03} +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.0251, 'learning_rate': 0.00021609756097560974, 'epoch': 3.03} +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.8948, 'learning_rate': 0.00021560975609756096, 'epoch': 3.04} +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.6973, 'learning_rate': 0.00021512195121951218, 'epoch': 3.04} +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.7551, 'learning_rate': 0.0002146341463414634, 'epoch': 3.04} +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.6294, 'learning_rate': 0.00021414634146341464, 'epoch': 3.05} +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.6347, 'learning_rate': 0.00021365853658536583, 'epoch': 3.05} +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.6635, 'learning_rate': 0.00021317073170731704, 'epoch': 3.06} +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.5707, 'learning_rate': 0.0002126829268292683, 'epoch': 3.06} +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.5126, 'learning_rate': 0.00021219512195121948, 'epoch': 3.07} +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.6055, 'learning_rate': 0.00021170731707317072, 'epoch': 3.07} +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.5535, 'learning_rate': 0.00021121951219512194, 'epoch': 3.08} +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.5278, 'learning_rate': 0.00021073170731707313, 'epoch': 3.08} +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.5102, 'learning_rate': 0.00021024390243902437, 'epoch': 3.09} + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.5113, 'learning_rate': 0.0002097560975609756, 'epoch': 3.09} + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4592, 'learning_rate': 0.0002092682926829268, 'epoch': 3.09} + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4637, 'learning_rate': 0.00020878048780487802, 'epoch': 3.1} + 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:06:27,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:06:27,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:06:27,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:06:27,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:06:27,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:06:27,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:06:27,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:06:27,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:06:44,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:06:44,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4902, 'learning_rate': 0.00020829268292682927, 'epoch': 3.1} +[WARNING|modeling_utils.py:388] 2022-03-26 00:06:44,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:06:44,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:06:44,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:06:44,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:06:44,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:06:44,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:06:44,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:06:44,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:06:44,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:06:44,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████���███████████████████████████████████▏ | 693/1115 [4:28:19<2:54:00, 24.74s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▏ | 693/1115 [4:28:19<2:54:00, 24.74s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4212, 'learning_rate': 0.00020780487804878046, 'epoch': 3.11} + 62%|███████████████████████████████████████████████▏ | 693/1115 [4:28:19<2:54:00, 24.74s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▏ | 693/1115 [4:28:19<2:54:00, 24.74s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▏ | 693/1115 [4:28:19<2:54:00, 24.74s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▏ | 693/1115 [4:28:19<2:54:00, 24.74s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▏ | 693/1115 [4:28:19<2:54:00, 24.74s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▏ | 693/1115 [4:28:19<2:54:00, 24.74s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▏ | 693/1115 [4:28:19<2:54:00, 24.74s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▏ | 693/1115 [4:28:19<2:54:00, 24.74s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▏ | 693/1115 [4:28:19<2:54:00, 24.74s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▏ | 693/1115 [4:28:19<2:54:00, 24.74s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4117, 'learning_rate': 0.00020731707317073167, 'epoch': 3.11} + 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4027, 'learning_rate': 0.00020682926829268292, 'epoch': 3.12} + 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████�� | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4618, 'learning_rate': 0.0002063414634146341, 'epoch': 3.12} + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:08:37,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:08:37,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:08:37,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:08:37,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:08:37,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3627, 'learning_rate': 0.00020585365853658535, 'epoch': 3.13} +[WARNING|modeling_utils.py:388] 2022-03-26 00:08:47,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:08:47,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:08:47,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:08:47,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:08:47,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:08:47,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:08:47,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:08:47,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:08:47,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:08:47,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▌ | 698/1115 [4:30:18<2:45:35, 23.83s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▌ | 698/1115 [4:30:18<2:45:35, 23.83s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4113, 'learning_rate': 0.00020536585365853657, 'epoch': 3.13} +[WARNING|modeling_utils.py:388] 2022-03-26 00:09:12,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:09:12,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:09:12,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:09:12,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:09:12,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:09:12,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:09:12,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:09:12,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:09:12,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:09:12,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:09:12,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3982, 'learning_rate': 0.0002048780487804878, 'epoch': 3.13} +[WARNING|modeling_utils.py:388] 2022-03-26 00:09:12,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:09:36,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:09:36,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:09:36,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:09:36,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:09:36,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:09:36,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:09:36,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:09:36,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:09:53,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:09:53,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:09:53,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3762, 'learning_rate': 0.000204390243902439, 'epoch': 3.14} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:09:53,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:09:53,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:09:53,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:09:53,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:09:53,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:09:53,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:09:53,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:09:53,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:09:53,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:09:53,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▊ | 701/1115 [4:31:27<2:40:51, 23.31s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▊ | 701/1115 [4:31:27<2:40:51, 23.31s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▊ | 701/1115 [4:31:27<2:40:51, 23.31s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▊ | 701/1115 [4:31:27<2:40:51, 23.31s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▊ | 701/1115 [4:31:27<2:40:51, 23.31s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▊ | 701/1115 [4:31:27<2:40:51, 23.31s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▊ | 701/1115 [4:31:27<2:40:51, 23.31s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▊ | 701/1115 [4:31:27<2:40:51, 23.31s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▊ | 701/1115 [4:31:27<2:40:51, 23.31s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▊ | 701/1115 [4:31:27<2:40:51, 23.31s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:10:38,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:10:38,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:10:38,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3957, 'learning_rate': 0.00020341463414634146, 'epoch': 3.15} +[WARNING|modeling_utils.py:388] 2022-03-26 00:10:38,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:10:46,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:10:46,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:10:46,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:10:46,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:10:46,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:10:56,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:10:56,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:10:56,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:10:56,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3483, 'learning_rate': 0.00020292682926829265, 'epoch': 3.15} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:10:56,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:10:56,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:10:56,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:10:56,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:10:56,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:10:56,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:10:56,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:11:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:11:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 704/1115 [4:32:33<2:32:28, 22.26s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 704/1115 [4:32:33<2:32:28, 22.26s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3318, 'learning_rate': 0.0002024390243902439, 'epoch': 3.16} + 63%|███████████████████████████████████████████████▉ | 704/1115 [4:32:33<2:32:28, 22.26s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:11:29,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:11:29,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:11:29,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:11:29,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:11:29,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:11:29,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:11:41,899 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:11:41,899 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:11:41,899 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4213, 'learning_rate': 0.00020195121951219511, 'epoch': 3.16} +[WARNING|modeling_utils.py:388] 2022-03-26 00:11:47,616 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:11:47,616 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:11:47,616 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:11:47,616 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:11:47,616 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:11:57,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:11:57,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:11:57,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:12:04,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:12:04,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.368, 'learning_rate': 0.0002014634146341463, 'epoch': 3.17} +[WARNING|modeling_utils.py:388] 2022-03-26 00:12:08,019 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:12:08,019 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:12:08,019 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:12:08,019 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:12:16,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:12:16,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:12:20,792 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:12:20,792 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:12:24,881 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:12:24,881 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3552, 'learning_rate': 0.00020097560975609755, 'epoch': 3.17} +[WARNING|modeling_utils.py:388] 2022-03-26 00:12:28,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:12:28,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:12:32,264 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:12:34,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:12:34,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:12:34,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:12:40,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:12:42,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:12:42,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:12:42,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:12:47,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:12:47,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:12:51,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:12:51,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:12:55,284 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:12:55,284 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:12:59,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:13:01,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:13:01,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:13:01,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:13:05,403 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:13:07,599 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:13:09,779 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:13:09,779 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:13:13,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:13:15,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:13:17,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:13:19,748 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:13:19,748 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:13:19,748 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:13:23,664 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:13:25,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:13:27,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:13:29,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:13:31,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:13:33,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:13:35,711 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:13:35,711 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:13:37,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:13:39,664 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:13:41,560 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:13:43,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:13:45,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:13:47,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:13:48,909 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:13:48,909 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:13:50,694 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:13:54,348 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:13:56,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:13:57,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:00,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:02,013 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:05,253 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:08,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:10,217 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:11,808 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:14,863 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:16,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:16,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:17,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:20,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:22,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:23,748 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:26,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:29,158 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:29,158 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:30,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:33,098 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:34,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:36,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:37,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:37,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:40,293 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:42,502 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:44,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:46,699 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:46,699 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:48,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:50,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:52,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:55,193 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:55,193 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:57,004 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:14:58,596 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:00,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:00,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:02,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:02,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:05,735 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:09,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:09,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:12,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:12,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:16,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:16,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:20,132 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:20,132 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:23,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:27,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:27,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:30,878 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:30,878 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3472, 'learning_rate': 0.0001946341463414634, 'epoch': 3.23} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:34,538 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:34,538 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:38,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:41,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:41,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:45,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:45,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:48,677 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:52,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:52,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:55,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:55,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:59,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:15:59,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2989, 'learning_rate': 0.0001941463414634146, 'epoch': 3.23} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:02,755 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:06,230 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:06,230 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:09,700 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:09,700 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:13,164 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:16,650 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:16,650 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:20,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:20,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:23,586 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:27,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:27,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.2984, 'learning_rate': 0.00019365853658536583, 'epoch': 3.24} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:30,735 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:30,735 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:34,134 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:37,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:37,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:40,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:40,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:44,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.9822, 'learning_rate': 0.00019317073170731707, 'epoch': 3.24} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.9058, 'learning_rate': 0.0001926829268292683, 'epoch': 3.25} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.8659, 'learning_rate': 0.00019219512195121948, 'epoch': 3.25} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▍ | 726/1115 [4:39:28<2:49:11, 26.10s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▍ | 726/1115 [4:39:28<2:49:11, 26.10s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.7585, 'learning_rate': 0.00019170731707317072, 'epoch': 3.26} + 65%|█████████████████████████████████████████████████▍ | 726/1115 [4:39:28<2:49:11, 26.10s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▍ | 726/1115 [4:39:28<2:49:11, 26.10s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▍ | 726/1115 [4:39:28<2:49:11, 26.10s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▍ | 726/1115 [4:39:28<2:49:11, 26.10s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▍ | 726/1115 [4:39:28<2:49:11, 26.10s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▍ | 726/1115 [4:39:28<2:49:11, 26.10s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▍ | 726/1115 [4:39:28<2:49:11, 26.10s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▍ | 726/1115 [4:39:28<2:49:11, 26.10s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▍ | 726/1115 [4:39:28<2:49:11, 26.10s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▍ | 726/1115 [4:39:28<2:49:11, 26.10s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▍ | 726/1115 [4:39:28<2:49:11, 26.10s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 727/1115 [4:39:55<2:50:14, 26.33s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 727/1115 [4:39:55<2:50:14, 26.33s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.7159, 'learning_rate': 0.00019121951219512194, 'epoch': 3.26} + 65%|█████████████████████████████████████████████████▌ | 727/1115 [4:39:55<2:50:14, 26.33s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 727/1115 [4:39:55<2:50:14, 26.33s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 727/1115 [4:39:55<2:50:14, 26.33s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 727/1115 [4:39:55<2:50:14, 26.33s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 727/1115 [4:39:55<2:50:14, 26.33s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 727/1115 [4:39:55<2:50:14, 26.33s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 727/1115 [4:39:55<2:50:14, 26.33s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 727/1115 [4:39:55<2:50:14, 26.33s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 727/1115 [4:39:55<2:50:14, 26.33s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|██████████████████████████████��██████████████████▌ | 727/1115 [4:39:55<2:50:14, 26.33s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 727/1115 [4:39:55<2:50:14, 26.33s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.643, 'learning_rate': 0.00019073170731707316, 'epoch': 3.26} + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.5668, 'learning_rate': 0.00019024390243902437, 'epoch': 3.27} + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.5185, 'learning_rate': 0.00018975609756097562, 'epoch': 3.27} + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|██████��██████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.5032, 'learning_rate': 0.0001892682926829268, 'epoch': 3.28} + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.478, 'learning_rate': 0.00018878048780487803, 'epoch': 3.28} + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4333, 'learning_rate': 0.00018829268292682927, 'epoch': 3.29} + 65%|███████��█████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4642, 'learning_rate': 0.00018780487804878046, 'epoch': 3.29} + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4117, 'learning_rate': 0.0001873170731707317, 'epoch': 3.3} + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|█████████████████████████████████���████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4252, 'learning_rate': 0.00018682926829268292, 'epoch': 3.3} + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3947, 'learning_rate': 0.0001863414634146341, 'epoch': 3.3} + 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4007, 'learning_rate': 0.00018585365853658535, 'epoch': 3.31} + 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▎ | 739/1115 [4:45:06<2:39:15, 25.41s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▎ | 739/1115 [4:45:06<2:39:15, 25.41s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▎ | 739/1115 [4:45:06<2:39:15, 25.41s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▎ | 739/1115 [4:45:06<2:39:15, 25.41s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▎ | 739/1115 [4:45:06<2:39:15, 25.41s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▎ | 739/1115 [4:45:06<2:39:15, 25.41s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▎ | 739/1115 [4:45:06<2:39:15, 25.41s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▎ | 739/1115 [4:45:06<2:39:15, 25.41s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▎ | 739/1115 [4:45:06<2:39:15, 25.41s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▎ | 739/1115 [4:45:06<2:39:15, 25.41s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▎ | 739/1115 [4:45:06<2:39:15, 25.41s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▎ | 739/1115 [4:45:06<2:39:15, 25.41s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3837, 'learning_rate': 0.0001848780487804878, 'epoch': 3.32} + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|████████████████████████████��█████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3415, 'learning_rate': 0.000184390243902439, 'epoch': 3.32} + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4264, 'learning_rate': 0.00018390243902439025, 'epoch': 3.33} + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|████████████████████████████████████���█████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3209, 'learning_rate': 0.00018341463414634144, 'epoch': 3.33} + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3521, 'learning_rate': 0.00018292682926829266, 'epoch': 3.34} + 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3439, 'learning_rate': 0.0001824390243902439, 'epoch': 3.34} + 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3283, 'learning_rate': 0.0001819512195121951, 'epoch': 3.35} +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3709, 'learning_rate': 0.00018146341463414633, 'epoch': 3.35} +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3629, 'learning_rate': 0.00018097560975609755, 'epoch': 3.35} + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3261, 'learning_rate': 0.00018048780487804877, 'epoch': 3.36} + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3054, 'learning_rate': 0.00017999999999999998, 'epoch': 3.36} + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:28:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:28:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:28:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:28:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:28:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:28:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:28:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:28:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3341, 'learning_rate': 0.0001795121951219512, 'epoch': 3.37} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:28:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:28:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:28:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:28:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:28:53,556 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:28:53,556 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:28:53,556 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:28:53,556 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:01,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:01,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:01,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:06,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:06,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:06,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:06,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:14,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:14,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:14,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:14,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:14,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:14,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3293, 'learning_rate': 0.00017853658536585363, 'epoch': 3.38} +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:30,328 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:30,328 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:30,328 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:30,328 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:30,328 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:30,328 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:30,328 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:30,328 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:30,328 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:48,895 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:48,895 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3783, 'learning_rate': 0.00017804878048780485, 'epoch': 3.38} +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:48,895 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:48,895 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:48,895 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:59,302 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:29:59,302 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:30:03,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:30:03,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:30:03,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:30:03,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:30:03,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3352, 'learning_rate': 0.0001775609756097561, 'epoch': 3.39} +[WARNING|modeling_utils.py:388] 2022-03-26 00:30:03,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:30:15,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:30:15,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:30:15,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:30:15,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:30:23,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:30:23,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:30:23,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:30:23,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.275, 'learning_rate': 0.00017707317073170729, 'epoch': 3.39} +[WARNING|modeling_utils.py:388] 2022-03-26 00:30:23,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:30:23,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:30:23,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:30:23,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:30:23,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:30:23,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:30:44,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:30:44,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:30:44,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|███████████████████████████████████████████████████▌ | 757/1115 [4:52:00<2:07:08, 21.31s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|███████████████████████████████████████████████████▌ | 757/1115 [4:52:00<2:07:08, 21.31s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3101, 'learning_rate': 0.00017658536585365853, 'epoch': 3.39} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:30:54,631 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:30:57,080 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:30:57,080 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:31:00,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:31:00,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:31:00,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:31:06,846 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:31:06,846 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:31:06,846 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:31:11,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:31:11,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:31:11,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:31:16,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:31:19,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:31:19,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:31:23,468 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:31:23,468 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:31:27,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:31:27,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2583, 'learning_rate': 0.00017560975609756094, 'epoch': 3.4} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:31:31,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:31:31,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:31:35,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:31:37,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:31:39,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:31:39,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:31:43,823 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:31:45,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:31:45,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2986, 'learning_rate': 0.00017512195121951218, 'epoch': 3.41} +[WARNING|modeling_utils.py:388] 2022-03-26 00:31:49,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:31:51,748 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:31:53,844 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:31:55,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:31:57,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:31:59,982 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:01,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:01,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:04,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:06,024 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:07,951 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:09,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:11,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:13,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:15,498 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:17,329 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:17,329 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:19,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:21,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:24,633 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:24,633 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:27,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:30,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:32,339 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:34,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:34,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:35,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:37,437 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:39,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:42,214 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:43,748 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:45,264 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:45,264 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:48,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:49,798 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:52,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:54,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:56,745 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:58,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:32:58,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:00,734 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:01,978 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:04,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:06,749 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:08,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:08,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:10,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:12,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:14,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:16,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:16,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:18,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:21,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:22,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:22,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:24,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:27,225 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:27,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:30,150 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:30,150 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:32,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:32,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:36,273 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:36,273 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:39,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:39,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:43,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:47,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:47,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:50,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:50,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:54,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:54,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:33:57,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:01,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:01,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9187, 'learning_rate': 0.00017024390243902438, 'epoch': 3.45} +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:05,105 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:05,105 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:08,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:08,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:12,113 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:15,578 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:15,578 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:19,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:19,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:22,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:26,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:26,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.4796, 'learning_rate': 0.00016975609756097557, 'epoch': 3.46} +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:29,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:29,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:33,342 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:33,342 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:36,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:40,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:40,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:43,827 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:43,827 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:47,270 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:47,270 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:50,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:54,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:54,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.9913, 'learning_rate': 0.0001692682926829268, 'epoch': 3.46} +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:57,769 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:34:57,769 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:01,259 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:04,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:04,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.6812, 'learning_rate': 0.00016878048780487803, 'epoch': 3.47} +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.6404, 'learning_rate': 0.00016829268292682927, 'epoch': 3.47} +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.6294, 'learning_rate': 0.00016780487804878046, 'epoch': 3.48} +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.5524, 'learning_rate': 0.0001673170731707317, 'epoch': 3.48} +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4834, 'learning_rate': 0.00016682926829268292, 'epoch': 3.48} +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4867, 'learning_rate': 0.0001663414634146341, 'epoch': 3.49} +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4394, 'learning_rate': 0.00016585365853658536, 'epoch': 3.49} +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4228, 'learning_rate': 0.00016536585365853657, 'epoch': 3.5} + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████���███████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3963, 'learning_rate': 0.0001648780487804878, 'epoch': 3.5} + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████��███████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3693, 'learning_rate': 0.000164390243902439, 'epoch': 3.51} + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3423, 'learning_rate': 0.00016390243902439025, 'epoch': 3.51} + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3681, 'learning_rate': 0.00016341463414634144, 'epoch': 3.52} + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:40:26,603 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:40:26,603 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:40:26,603 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:40:26,603 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:40:26,603 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:40:26,603 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:40:26,603 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:40:26,603 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|████████████���████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3625, 'learning_rate': 0.0001624390243902439, 'epoch': 3.52} + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|████████████████████████████████████████████████████���▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.335, 'learning_rate': 0.0001619512195121951, 'epoch': 3.53} + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3001, 'learning_rate': 0.00016146341463414634, 'epoch': 3.53} + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3523, 'learning_rate': 0.00016097560975609755, 'epoch': 3.54} + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3084, 'learning_rate': 0.00016048780487804874, 'epoch': 3.54} +[WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3502, 'learning_rate': 0.00015999999999999999, 'epoch': 3.55} + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3246, 'learning_rate': 0.0001595121951219512, 'epoch': 3.55} + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3061, 'learning_rate': 0.00015902439024390242, 'epoch': 3.56} + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3581, 'learning_rate': 0.00015853658536585364, 'epoch': 3.56} +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3109, 'learning_rate': 0.00015804878048780488, 'epoch': 3.57} +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▎ | 796/1115 [5:06:21<2:08:05, 24.09s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▎ | 796/1115 [5:06:21<2:08:05, 24.09s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2814, 'learning_rate': 0.00015756097560975607, 'epoch': 3.57} + 71%|██████████████████████████████████████████████████████▎ | 796/1115 [5:06:21<2:08:05, 24.09s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▎ | 796/1115 [5:06:21<2:08:05, 24.09s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▎ | 796/1115 [5:06:21<2:08:05, 24.09s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▎ | 796/1115 [5:06:21<2:08:05, 24.09s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▎ | 796/1115 [5:06:21<2:08:05, 24.09s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▎ | 796/1115 [5:06:21<2:08:05, 24.09s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▎ | 796/1115 [5:06:21<2:08:05, 24.09s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▎ | 796/1115 [5:06:21<2:08:05, 24.09s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:45:32,437 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2509, 'learning_rate': 0.0001570731707317073, 'epoch': 3.57} + 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████��████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3046, 'learning_rate': 0.00015658536585365853, 'epoch': 3.58} + 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:46:13,100 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:46:13,100 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:46:13,100 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:46:13,100 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:46:13,100 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2995, 'learning_rate': 0.00015609756097560975, 'epoch': 3.58} +[WARNING|modeling_utils.py:388] 2022-03-26 00:46:13,100 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:46:25,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:46:25,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:46:25,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:46:25,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:46:33,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:46:33,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:46:33,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:46:33,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:46:33,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:46:33,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:46:33,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2679, 'learning_rate': 0.00015560975609756097, 'epoch': 3.59} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:46:33,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:46:33,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:46:33,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:46:33,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:46:56,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:46:56,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:47:00,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:47:00,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:47:00,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|██████████████████████████████████████████████████████▌ | 801/1115 [5:08:16<2:01:00, 23.12s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|██████████████████████████████████████████████████████▌ | 801/1115 [5:08:16<2:01:00, 23.12s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3044, 'learning_rate': 0.00015512195121951218, 'epoch': 3.59} + 72%|██████████████████████████████████████████████████████▌ | 801/1115 [5:08:16<2:01:00, 23.12s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|██████████████████████████████████████████████████████▌ | 801/1115 [5:08:16<2:01:00, 23.12s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|██████████████████████████████████████████████████████▌ | 801/1115 [5:08:16<2:01:00, 23.12s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|██████████████████████████████████████████████████████▌ | 801/1115 [5:08:16<2:01:00, 23.12s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|██████████████████████████████████████████████████████▌ | 801/1115 [5:08:16<2:01:00, 23.12s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|██████████████████████████████████████████████████████▌ | 801/1115 [5:08:16<2:01:00, 23.12s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|██████████████████████████████████████████████████████▌ | 801/1115 [5:08:16<2:01:00, 23.12s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|███████████████████████████████���██████████████████████▌ | 801/1115 [5:08:16<2:01:00, 23.12s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|██████████████████████████████████████████████████████▌ | 801/1115 [5:08:16<2:01:00, 23.12s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|██████████████████████████████████████████████████████▌ | 801/1115 [5:08:16<2:01:00, 23.12s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|██████████████████████████████████████████████████████▌ | 801/1115 [5:08:16<2:01:00, 23.12s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.286, 'learning_rate': 0.00015463414634146343, 'epoch': 3.6} +[WARNING|modeling_utils.py:388] 2022-03-26 00:47:33,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:47:33,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:47:33,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:47:33,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:47:33,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:47:33,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:47:45,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:47:45,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|██████████████████████████████████████████████████████▋ | 803/1115 [5:09:00<1:56:02, 22.32s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|██████████████████████████████████████████████████████▋ | 803/1115 [5:09:00<1:56:02, 22.32s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2883, 'learning_rate': 0.00015414634146341462, 'epoch': 3.6} + 72%|██████████████████████████████████████████████████████▋ | 803/1115 [5:09:00<1:56:02, 22.32s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|██████████████████████████████████████████████████████▋ | 803/1115 [5:09:00<1:56:02, 22.32s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:47:58,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:47:58,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:47:58,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:48:03,767 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:48:03,767 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:48:07,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:48:07,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:48:07,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3195, 'learning_rate': 0.00015365853658536583, 'epoch': 3.61} +[WARNING|modeling_utils.py:388] 2022-03-26 00:48:07,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:48:07,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:48:07,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:48:07,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:48:22,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:48:22,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:48:22,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:48:22,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:48:22,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:48:22,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:48:32,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:48:32,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:48:32,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:48:32,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:48:32,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:48:42,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:48:42,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:48:46,918 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:48:46,918 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:48:51,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:48:51,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2843, 'learning_rate': 0.00015268292682926827, 'epoch': 3.61} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:48:54,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:48:54,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:48:54,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:49:01,348 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:49:01,348 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:49:04,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:49:07,383 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:49:07,383 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:49:07,383 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:49:07,383 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2142, 'learning_rate': 0.0001521951219512195, 'epoch': 3.62} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:49:07,383 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:49:17,170 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:49:17,170 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:49:21,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:49:23,414 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:49:23,414 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:49:27,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:49:29,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:49:29,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2466, 'learning_rate': 0.00015170731707317073, 'epoch': 3.62} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:49:29,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:49:35,778 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:49:38,051 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:49:38,051 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:49:41,837 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:49:44,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:49:44,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:49:48,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:49:48,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2987, 'learning_rate': 0.00015121951219512192, 'epoch': 3.63} +[WARNING|modeling_utils.py:388] 2022-03-26 00:49:51,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:49:54,193 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:49:56,341 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:49:58,478 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:50:00,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:50:02,660 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:50:04,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:50:04,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 00:50:04,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:50:08,636 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:50:10,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:50:12,686 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:50:14,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:50:16,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:50:18,627 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:50:20,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:50:20,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 811/1115 [5:11:32<1:31:37, 18.08s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:50:22,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:50:24,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:22,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:50:26,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:22,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:50:28,284 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:22,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:50:30,129 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:22,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:50:31,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:22,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:50:33,811 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:22,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:50:33,811 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:22,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 812/1115 [5:11:47<1:26:31, 17.13s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:50:37,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:50:39,206 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:37,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:50:40,923 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:37,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:50:42,629 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:37,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:50:45,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:37,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:50:46,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:37,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:50:48,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:37,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▍ | 813/1115 [5:12:02<1:22:11, 16.33s/it] Setting `use_cache=False`...1] 2022-03-26 00:50:37,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▍ | 813/1115 [5:12:02<1:22:11, 16.33s/it] Setting `use_cache=False`...1] 2022-03-26 00:50:37,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:50:53,513 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:51,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:50:55,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:51,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:50:56,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:51,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:50:58,135 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:51,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:01,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:51,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:02,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:51,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:02,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:51,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:05,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:04,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:06,921 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:04,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:09,603 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:04,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:10,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:04,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:13,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:04,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:13,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:04,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▌ | 815/1115 [5:12:25<1:09:10, 13.83s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:51:14,919 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:17,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:14,919 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:19,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:14,919 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:20,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:14,919 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:23,210 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:14,919 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:23,210 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:14,919 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:25,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:24,428 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:27,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:24,428 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:29,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:24,428 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:31,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:24,428 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:31,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:24,428 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:33,803 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:32,846 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:35,644 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:32,846 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:37,403 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:32,846 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████████▏ | 818/1115 [5:12:50<50:11, 10.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:51:40,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████████▏ | 818/1115 [5:12:50<50:11, 10.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:51:40,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:41,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:40,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:44,230 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:40,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:45,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:40,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:45,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:40,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████████▎ | 819/1115 [5:12:57<45:33, 9.23s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████████▎ | 819/1115 [5:12:57<45:33, 9.23s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████���█████▎ | 819/1115 [5:12:57<45:33, 9.23s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:52,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:55,825 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:55,825 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:59,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:51:59,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:52:03,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:52:06,643 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:52:06,643 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:52:10,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:52:10,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:52:13,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|███████████████████████████████████████████████████████▉ | 820/1115 [5:13:27<1:14:58, 15.25s/it] Setting `use_cache=False`...1] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|███████████████████████████████████████████████████████▉ | 820/1115 [5:13:27<1:14:58, 15.25s/it] Setting `use_cache=False`...1] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|███████████████████████████████████████████████████████▉ | 820/1115 [5:13:27<1:14:58, 15.25s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|███████████████████████████████████████████████████████▉ | 820/1115 [5:13:27<1:14:58, 15.25s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:52:20,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:52:24,523 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:52:24,523 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:52:28,077 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:52:28,077 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:52:31,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:52:31,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:52:35,074 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:52:38,563 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:52:38,563 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:52:42,038 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|███████████████████████████████████████████████████████▉ | 821/1115 [5:13:55<1:33:43, 19.13s/it] Setting `use_cache=False`...1] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|███████████████████████████████████████████████████████▉ | 821/1115 [5:13:55<1:33:43, 19.13s/it] Setting `use_cache=False`...1] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|███████████████████████████████████████████████████████▉ | 821/1115 [5:13:55<1:33:43, 19.13s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:52:49,067 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:52:49,067 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:52:52,565 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:52:52,565 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:52:55,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:52:59,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:52:59,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:02,923 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:02,923 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:06,388 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:09,807 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:09,807 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:09,807 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|████████████████████████████████████████████████████████ | 822/1115 [5:14:22<1:45:58, 21.70s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|████████████████████████████████████████████████████████ | 822/1115 [5:14:22<1:45:58, 21.70s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:16,642 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:20,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:20,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:23,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:23,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:26,816 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:30,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:30,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.5068, 'learning_rate': 0.000144390243902439, 'epoch': 3.69} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.473, 'learning_rate': 0.00014390243902439023, 'epoch': 3.7} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4847, 'learning_rate': 0.00014341463414634144, 'epoch': 3.7} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4595, 'learning_rate': 0.00014292682926829269, 'epoch': 3.7} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3926, 'learning_rate': 0.00014243902439024388, 'epoch': 3.71} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3979, 'learning_rate': 0.0001419512195121951, 'epoch': 3.71} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3735, 'learning_rate': 0.00014146341463414634, 'epoch': 3.72} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3384, 'learning_rate': 0.00014097560975609755, 'epoch': 3.72} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3536, 'learning_rate': 0.00014048780487804877, 'epoch': 3.73} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3151, 'learning_rate': 0.00014, 'epoch': 3.73} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3147, 'learning_rate': 0.0001395121951219512, 'epoch': 3.74} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3133, 'learning_rate': 0.00013902439024390242, 'epoch': 3.74} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.255, 'learning_rate': 0.00013853658536585364, 'epoch': 3.74} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2868, 'learning_rate': 0.00013804878048780486, 'epoch': 3.75} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2783, 'learning_rate': 0.0001375609756097561, 'epoch': 3.75} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2877, 'learning_rate': 0.00013707317073170732, 'epoch': 3.76} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2775, 'learning_rate': 0.0001365853658536585, 'epoch': 3.76} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2804, 'learning_rate': 0.00013609756097560975, 'epoch': 3.77} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2744, 'learning_rate': 0.00013560975609756097, 'epoch': 3.77} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2385, 'learning_rate': 0.00013512195121951218, 'epoch': 3.78} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2474, 'learning_rate': 0.0001346341463414634, 'epoch': 3.78} +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2529, 'learning_rate': 0.00013414634146341462, 'epoch': 3.78} +[WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:03:00,809 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:03:00,809 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2812, 'learning_rate': 0.00013365853658536586, 'epoch': 3.79} + 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2556, 'learning_rate': 0.00013317073170731705, 'epoch': 3.79} + 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2562, 'learning_rate': 0.0001321951219512195, 'epoch': 3.8} + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2345, 'learning_rate': 0.00013170731707317073, 'epoch': 3.81} + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2582, 'learning_rate': 0.00013121951219512195, 'epoch': 3.81} + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|██████████████████████████████████████████████████████████ | 851/1115 [5:26:34<1:41:50, 23.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|██████████████████████████████████████████████████████████ | 851/1115 [5:26:34<1:41:50, 23.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2171, 'learning_rate': 0.00013073170731707316, 'epoch': 3.82} + 76%|██████████████████████████████████████████████████████████ | 851/1115 [5:26:34<1:41:50, 23.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|██████████████████████████████████████████████████████████ | 851/1115 [5:26:34<1:41:50, 23.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|██████████████████████████████████████████████████████████ | 851/1115 [5:26:34<1:41:50, 23.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|██████████████████████████████████████████████████████████ | 851/1115 [5:26:34<1:41:50, 23.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|██████████████████████████████████████████████████████████ | 851/1115 [5:26:34<1:41:50, 23.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|██████████████████████████████████████████████████████████ | 851/1115 [5:26:34<1:41:50, 23.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|██████████████████████████████████████████████████████████ | 851/1115 [5:26:34<1:41:50, 23.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|██████████████████████████████████████████████████████████ | 851/1115 [5:26:34<1:41:50, 23.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|██████████████████████████████████████████████████████████ | 851/1115 [5:26:34<1:41:50, 23.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|██████████████████████████████████████████████████████████ | 851/1115 [5:26:34<1:41:50, 23.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:05:46,648 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:05:46,648 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:05:50,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:05:50,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:05:50,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:05:50,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:05:50,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:05:50,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:05:50,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:06:04,878 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|██████████████████████████████████████████████████████████▏ | 853/1115 [5:27:17<1:37:50, 22.41s/it] Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|██████████████████████████████████████████████████████████▏ | 853/1115 [5:27:17<1:37:50, 22.41s/it] Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.227, 'learning_rate': 0.0001297560975609756, 'epoch': 3.83} + 77%|██████████████████████████████████████████████████████████▏ | 853/1115 [5:27:17<1:37:50, 22.41s/it] Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|██████████████████████████████████████████████████████████▏ | 853/1115 [5:27:17<1:37:50, 22.41s/it] Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|██████████████████████████████████████████████████████████▏ | 853/1115 [5:27:17<1:37:50, 22.41s/it] Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:06:17,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:06:17,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:06:17,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:06:23,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:06:23,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:06:23,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:06:23,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2222, 'learning_rate': 0.00012926829268292681, 'epoch': 3.83} +[WARNING|modeling_bart.py:1051] 2022-03-26 01:06:23,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:06:23,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:06:23,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:06:23,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:06:39,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:06:39,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:06:39,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:06:45,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:06:45,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:06:45,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:06:49,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:06:49,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:06:49,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:06:55,984 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:06:55,984 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:06:55,984 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:07:02,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:07:02,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:07:02,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:07:08,012 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:07:08,012 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.235, 'learning_rate': 0.00012829268292682925, 'epoch': 3.84} +[WARNING|modeling_bart.py:1051] 2022-03-26 01:07:08,012 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:07:08,012 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:07:08,012 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:07:18,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:07:18,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:07:18,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:07:24,171 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:07:24,171 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:07:28,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:07:28,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2213, 'learning_rate': 0.0001278048780487805, 'epoch': 3.84} +[WARNING|modeling_bart.py:1051] 2022-03-26 01:07:28,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:07:34,719 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:07:37,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:07:37,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:07:37,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:07:42,891 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:07:42,891 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:07:46,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:07:46,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:07:46,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:07:51,207 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:07:51,207 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:07:55,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:07:57,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:07:57,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:08:01,432 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:08:01,432 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:08:05,142 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:08:05,142 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:08:07,448 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:08:09,618 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:08:11,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:08:13,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:08:16,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:08:18,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:08:20,250 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:08:20,250 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:08:20,250 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|██████████████████████████████████████████████████████████▌ | 860/1115 [5:29:34<1:20:34, 18.96s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:08:24,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:08:26,146 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:24,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:08:28,174 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:24,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:08:30,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:24,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:08:32,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:24,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:08:34,032 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:24,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:08:35,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:24,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:08:37,912 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:24,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:08:37,912 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:24,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|██████████████████████████████████████████████████████████▋ | 861/1115 [5:29:50<1:16:17, 18.02s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:08:39,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:08:41,774 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:39,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:08:43,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:39,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:08:45,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:39,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:08:47,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:39,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:08:50,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:39,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:08:52,666 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:39,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:08:52,666 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:39,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|██████████████████████████████████████████████████████████▊ | 862/1115 [5:30:04<1:11:47, 17.02s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:08:54,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:08:56,337 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:54,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:08:58,042 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:54,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:08:59,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:54,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:09:02,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:54,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:09:03,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:54,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:09:05,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:54,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|██████████████████████████████████████████████████████████▊ | 863/1115 [5:30:19<1:08:12, 16.24s/it] Setting `use_cache=False`...1] 2022-03-26 01:08:54,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|██████████████████████████████████████████████████████████▊ | 863/1115 [5:30:19<1:08:12, 16.24s/it] Setting `use_cache=False`...1] 2022-03-26 01:08:54,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:09:10,545 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:08,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:09:12,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:08,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:09:13,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:08,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:09:16,620 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:08,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:09:18,114 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:08,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|██████████████████████████████████████████████████████████▉ | 864/1115 [5:30:31<1:02:54, 15.04s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:09:21,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|██████████████████████████████████████████████████████████▉ | 864/1115 [5:30:31<1:02:54, 15.04s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:09:21,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:09:22,474 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:21,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:09:25,154 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:21,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:09:26,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:21,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:09:29,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:21,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|████████████████████████████████████████████████████████████▌ | 865/1115 [5:30:42<57:10, 13.72s/it] Setting `use_cache=False`...1] 2022-03-26 01:09:21,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|████████████████████████████████████████████████████████████▌ | 865/1115 [5:30:42<57:10, 13.72s/it] Setting `use_cache=False`...1] 2022-03-26 01:09:21,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:09:32,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:31,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:09:35,281 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:31,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:09:36,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:31,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:09:38,704 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:31,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|████████████████████████████████████████████████████████████▌ | 866/1115 [5:30:51<51:31, 12.41s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:09:40,982 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|████████████████████████████████████████████████████████████▌ | 866/1115 [5:30:51<51:31, 12.41s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:09:40,982 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:09:43,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:40,982 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:09:45,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:40,982 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:09:47,044 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:40,982 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|████████████████████████████████████████████████████████████▋ | 867/1115 [5:30:59<45:56, 11.12s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:09:48,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|████████████████████████████████████████████████████████████▋ | 867/1115 [5:30:59<45:56, 11.12s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:09:48,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:09:50,804 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:48,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:09:53,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:48,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:09:55,217 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:48,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:09:55,217 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:48,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:09:56,980 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:56,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:09:59,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:56,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:10:01,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:56,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|████████████████████████████████████████████████████████████▊ | 869/1115 [5:31:13<36:58, 9.02s/it] Setting `use_cache=False`...1] 2022-03-26 01:09:56,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|████████████████████████████████████████████████████████████▊ | 869/1115 [5:31:13<36:58, 9.02s/it] Setting `use_cache=False`...1] 2022-03-26 01:09:56,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|████████████████████████████████████████████████████████████▊ | 869/1115 [5:31:13<36:58, 9.02s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|████████████████████████████████████████████████████████████▊ | 869/1115 [5:31:13<36:58, 9.02s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:10:07,715 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:10:11,352 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:10:11,352 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:10:14,923 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:10:14,923 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:10:18,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:10:21,988 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:10:21,988 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:10:25,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:10:25,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:10:29,044 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▎ | 870/1115 [5:31:42<1:00:53, 14.91s/it] Setting `use_cache=False`...1] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▎ | 870/1115 [5:31:42<1:00:53, 14.91s/it] Setting `use_cache=False`...1] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▎ | 870/1115 [5:31:42<1:00:53, 14.91s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:10:36,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:10:36,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:10:39,472 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:10:39,472 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:10:42,861 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:10:46,241 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:10:46,241 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:10:49,626 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:10:49,626 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:10:52,982 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:10:56,403 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▎ | 871/1115 [5:32:09<1:15:49, 18.64s/it] Setting `use_cache=False`...1] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▎ | 871/1115 [5:32:09<1:15:49, 18.64s/it] Setting `use_cache=False`...1] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▎ | 871/1115 [5:32:09<1:15:49, 18.64s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:11:03,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:11:03,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:11:06,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:11:06,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:11:09,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:11:13,324 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:11:13,324 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:11:16,679 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:11:16,679 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:11:19,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:11:23,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▍ | 872/1115 [5:32:36<1:25:28, 21.11s/it] Setting `use_cache=False`...1] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▍ | 872/1115 [5:32:36<1:25:28, 21.11s/it] Setting `use_cache=False`...1] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▍ | 872/1115 [5:32:36<1:25:28, 21.11s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:11:30,081 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:11:30,081 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:11:33,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:11:33,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:11:36,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:11:40,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:11:40,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:11:43,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:11:43,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:11:46,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:11:50,042 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▌ | 873/1115 [5:33:03<1:31:55, 22.79s/it] Setting `use_cache=False`...1] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▌ | 873/1115 [5:33:03<1:31:55, 22.79s/it] Setting `use_cache=False`...1] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▌ | 873/1115 [5:33:03<1:31:55, 22.79s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:11:56,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:11:56,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3146, 'learning_rate': 0.0001195121951219512, 'epoch': 3.92} +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3146, 'learning_rate': 0.00011902439024390242, 'epoch': 3.92} +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3472, 'learning_rate': 0.00011853658536585365, 'epoch': 3.93} +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2634, 'learning_rate': 0.00011804878048780487, 'epoch': 3.93} +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3482, 'learning_rate': 0.00011756097560975607, 'epoch': 3.94} +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2643, 'learning_rate': 0.0001170731707317073, 'epoch': 3.94} +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|███████████████████████████████████████████████████████████▉ | 880/1115 [5:36:01<1:36:56, 24.75s/it] Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|███████████████████████████████████████████████████████████▉ | 880/1115 [5:36:01<1:36:56, 24.75s/it] Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2699, 'learning_rate': 0.00011658536585365852, 'epoch': 3.95} +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2764, 'learning_rate': 0.00011609756097560974, 'epoch': 3.95} +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2324, 'learning_rate': 0.00011560975609756097, 'epoch': 3.96} +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2683, 'learning_rate': 0.00011512195121951219, 'epoch': 3.96} +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:16:16,876 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:16:16,876 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:16:16,876 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:16:16,876 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:16:16,876 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:16:16,876 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:16:26,770 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:16:26,770 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:16:26,770 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:16:26,770 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:16:34,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:16:34,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:16:34,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:16:34,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:16:34,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:16:34,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:16:34,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:16:34,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2287, 'learning_rate': 0.00011414634146341462, 'epoch': 3.97} +[WARNING|modeling_utils.py:388] 2022-03-26 01:16:34,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:16:34,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:16:34,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:16:57,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:16:57,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:16:57,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:16:57,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:16:57,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:16:57,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:17:07,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:17:07,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:17:07,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:17:13,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:17:13,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:17:17,686 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:17:17,686 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:17:17,686 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:17:23,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:17:23,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|████████████████████████████████████████████████████████████▍ | 887/1115 [5:38:35<1:20:55, 21.29s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|████████████████████████████████████████████████████████████▍ | 887/1115 [5:38:35<1:20:55, 21.29s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:17:29,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:17:31,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:17:31,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:17:31,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:17:37,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:17:39,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:17:41,772 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:17:41,772 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:17:43,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:17:45,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:17:47,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:17:49,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:17:51,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:17:52,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:17:54,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:17:54,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:17:57,997 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:17:59,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:01,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:03,972 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:05,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:07,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:07,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:09,353 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:11,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:14,034 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:16,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:16,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:18,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:19,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:21,592 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:21,592 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:23,920 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:26,418 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:26,418 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:30,021 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:30,021 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:33,672 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:33,672 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:37,342 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:37,342 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:41,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:44,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:44,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:48,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:48,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:51,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:51,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4579, 'learning_rate': 0.00011024390243902438, 'epoch': 4.0} +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:55,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:59,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:18:59,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:02,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:02,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:06,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:09,593 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:09,593 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:13,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:13,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4033, 'learning_rate': 0.0001097560975609756, 'epoch': 4.01} +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.313, 'learning_rate': 0.00010926829268292683, 'epoch': 4.01} +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2394, 'learning_rate': 0.00010878048780487805, 'epoch': 4.02} +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2394, 'learning_rate': 0.00010829268292682925, 'epoch': 4.02} +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2715, 'learning_rate': 0.00010780487804878048, 'epoch': 4.03} +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.24, 'learning_rate': 0.0001073170731707317, 'epoch': 4.03} + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2086, 'learning_rate': 0.00010682926829268291, 'epoch': 4.04} + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2325, 'learning_rate': 0.00010634146341463414, 'epoch': 4.04} + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1837, 'learning_rate': 0.00010585365853658536, 'epoch': 4.04} + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1641, 'learning_rate': 0.00010536585365853656, 'epoch': 4.05} + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.169, 'learning_rate': 0.0001048780487804878, 'epoch': 4.05} + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1444, 'learning_rate': 0.00010439024390243901, 'epoch': 4.06} + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1955, 'learning_rate': 0.00010390243902439023, 'epoch': 4.06} + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1712, 'learning_rate': 0.00010341463414634146, 'epoch': 4.07} + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1723, 'learning_rate': 0.00010292682926829268, 'epoch': 4.07} + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1577, 'learning_rate': 0.0001024390243902439, 'epoch': 4.08} + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1411, 'learning_rate': 0.00010195121951219511, 'epoch': 4.08} + 81%|████████████████████████████████████████████████████████████���▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.147, 'learning_rate': 0.00010146341463414633, 'epoch': 4.09} + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1688, 'learning_rate': 0.00010097560975609756, 'epoch': 4.09} + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█���███████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1425, 'learning_rate': 0.00010048780487804877, 'epoch': 4.09} + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.184, 'learning_rate': 9.999999999999999e-05, 'epoch': 4.1} + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1604, 'learning_rate': 9.951219512195122e-05, 'epoch': 4.1} + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████��███████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1326, 'learning_rate': 9.902439024390243e-05, 'epoch': 4.11} + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|████████████████���████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1505, 'learning_rate': 9.853658536585364e-05, 'epoch': 4.11} + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1165, 'learning_rate': 9.804878048780487e-05, 'epoch': 4.12} +[WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1217, 'learning_rate': 9.756097560975609e-05, 'epoch': 4.12} +[WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:30:24,160 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:30:24,160 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:30:24,160 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:30:24,160 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▋ | 920/1115 [5:51:42<1:18:17, 24.09s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▋ | 920/1115 [5:51:42<1:18:17, 24.09s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1376, 'learning_rate': 9.70731707317073e-05, 'epoch': 4.13} + 83%|██████████████████████████████████████████████████████████████▋ | 920/1115 [5:51:42<1:18:17, 24.09s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▋ | 920/1115 [5:51:42<1:18:17, 24.09s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▋ | 920/1115 [5:51:42<1:18:17, 24.09s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:30:42,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:30:42,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:30:42,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:30:48,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:30:48,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:30:48,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▊ | 921/1115 [5:52:05<1:16:58, 23.81s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▊ | 921/1115 [5:52:05<1:16:58, 23.81s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1416, 'learning_rate': 9.658536585365854e-05, 'epoch': 4.13} + 83%|██████████████████████████████████████████████████████████████▊ | 921/1115 [5:52:05<1:16:58, 23.81s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▊ | 921/1115 [5:52:05<1:16:58, 23.81s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▊ | 921/1115 [5:52:05<1:16:58, 23.81s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▊ | 921/1115 [5:52:05<1:16:58, 23.81s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▊ | 921/1115 [5:52:05<1:16:58, 23.81s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▊ | 921/1115 [5:52:05<1:16:58, 23.81s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▊ | 921/1115 [5:52:05<1:16:58, 23.81s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▊ | 921/1115 [5:52:05<1:16:58, 23.81s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:31:16,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:31:16,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:31:16,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1483, 'learning_rate': 9.609756097560974e-05, 'epoch': 4.13} +[WARNING|modeling_utils.py:388] 2022-03-26 01:31:16,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:31:16,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:31:16,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:31:16,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:31:16,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:31:16,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:31:16,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:31:16,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:31:16,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▉ | 923/1115 [5:52:50<1:14:21, 23.24s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▉ | 923/1115 [5:52:50<1:14:21, 23.24s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1441, 'learning_rate': 9.560975609756097e-05, 'epoch': 4.14} + 83%|██████████████████████████████████████████████████████████████▉ | 923/1115 [5:52:50<1:14:21, 23.24s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▉ | 923/1115 [5:52:50<1:14:21, 23.24s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▉ | 923/1115 [5:52:50<1:14:21, 23.24s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▉ | 923/1115 [5:52:50<1:14:21, 23.24s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▉ | 923/1115 [5:52:50<1:14:21, 23.24s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▉ | 923/1115 [5:52:50<1:14:21, 23.24s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▉ | 923/1115 [5:52:50<1:14:21, 23.24s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▉ | 923/1115 [5:52:50<1:14:21, 23.24s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▉ | 923/1115 [5:52:50<1:14:21, 23.24s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▉ | 924/1115 [5:53:13<1:13:13, 23.00s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▉ | 924/1115 [5:53:13<1:13:13, 23.00s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1459, 'learning_rate': 9.512195121951219e-05, 'epoch': 4.14} + 83%|██████████████████████████████████████████████████████████████▉ | 924/1115 [5:53:13<1:13:13, 23.00s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▉ | 924/1115 [5:53:13<1:13:13, 23.00s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▉ | 924/1115 [5:53:13<1:13:13, 23.00s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██████████████████████████████████████████████████████████████▉ | 924/1115 [5:53:13<1:13:13, 23.00s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:32:15,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:32:15,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:32:19,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:32:19,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:32:23,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:32:23,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:32:23,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:32:23,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.138, 'learning_rate': 9.46341463414634e-05, 'epoch': 4.15} +[WARNING|modeling_utils.py:388] 2022-03-26 01:32:23,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:32:23,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:32:23,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:32:23,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:32:23,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:32:41,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:32:41,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:32:41,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:32:41,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:32:41,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1196, 'learning_rate': 9.414634146341463e-05, 'epoch': 4.15} +[WARNING|modeling_utils.py:388] 2022-03-26 01:32:41,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:32:41,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:32:41,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:32:41,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:32:41,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:32:41,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:32:41,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:33:06,339 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|███████████████████████████████████████████████████████████████▏ | 927/1115 [5:54:18<1:09:19, 22.12s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|███████████████████████████████████████████████████████████████▏ | 927/1115 [5:54:18<1:09:19, 22.12s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1039, 'learning_rate': 9.365853658536585e-05, 'epoch': 4.16} + 83%|███████████████████████████████████████████████████████████████▏ | 927/1115 [5:54:18<1:09:19, 22.12s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|███████████████████████████████████████████████████████████████▏ | 927/1115 [5:54:18<1:09:19, 22.12s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|███████████████████████████████████████████████████████████████▏ | 927/1115 [5:54:18<1:09:19, 22.12s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:33:18,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:33:18,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:33:18,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:33:18,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:33:18,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:33:28,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:33:28,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1461, 'learning_rate': 9.317073170731706e-05, 'epoch': 4.16} +[WARNING|modeling_utils.py:388] 2022-03-26 01:33:32,820 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:33:32,820 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:33:32,820 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:33:39,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:33:39,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:33:39,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:33:39,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:33:47,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|███████████████████████████████████████████████████████████████▎ | 929/1115 [5:55:00<1:06:01, 21.30s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|███████████████████████████████████████████████████████████████▎ | 929/1115 [5:55:00<1:06:01, 21.30s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.156, 'learning_rate': 9.268292682926829e-05, 'epoch': 4.17} + 83%|███████████████████████████████████████████████████████████████▎ | 929/1115 [5:55:00<1:06:01, 21.30s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:33:55,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:33:55,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:33:59,997 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:33:59,997 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:34:04,041 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:34:04,041 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:34:04,041 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:34:10,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:34:10,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1005, 'learning_rate': 9.21951219512195e-05, 'epoch': 4.17} +[WARNING|modeling_utils.py:388] 2022-03-26 01:34:10,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:34:10,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:34:17,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:34:17,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:34:21,896 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:34:21,896 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:34:26,106 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|███████████████████████████████████████████████████████████████▍ | 931/1115 [5:55:38<1:01:58, 20.21s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|███████████████████████████████████████████████████████████████▍ | 931/1115 [5:55:38<1:01:58, 20.21s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:34:30,074 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:34:30,074 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:34:30,074 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:34:30,074 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:34:38,518 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:34:40,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:34:40,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:34:44,556 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:34:46,772 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:34:46,772 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1158, 'learning_rate': 9.121951219512195e-05, 'epoch': 4.18} +[WARNING|modeling_bart.py:1051] 2022-03-26 01:34:50,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:34:53,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:34:55,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:34:57,213 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:34:59,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:01,328 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:03,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:05,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:05,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:07,604 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:09,596 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:11,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:13,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:15,471 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:17,369 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:17,369 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:19,253 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:21,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:23,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:24,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:26,785 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:28,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:32,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:32,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:33,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:35,720 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:37,429 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:39,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:42,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:44,176 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:45,816 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:45,816 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:47,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:50,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:52,244 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:53,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:56,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:58,207 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:58,207 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:35:59,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:02,528 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:03,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:05,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:08,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:10,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:10,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:12,789 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:14,050 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:16,546 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:18,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:21,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:21,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:22,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:24,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:26,603 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:28,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:28,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:30,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:32,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:34,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:36,851 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:36,851 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:38,620 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:40,950 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:43,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:43,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1689, 'learning_rate': 8.634146341463413e-05, 'epoch': 4.22} +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:46,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:46,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:50,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:50,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:53,937 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:57,643 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:36:57,643 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:01,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:01,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:04,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:04,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:08,609 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:08,609 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:08,609 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:12,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:15,834 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:15,834 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:19,418 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:19,418 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:22,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:22,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:26,521 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:30,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:30,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:33,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:33,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:38,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:38,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:41,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:41,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3293, 'learning_rate': 8.536585365853658e-05, 'epoch': 4.23} +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:45,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:48,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:48,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:52,114 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:52,114 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:55,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:59,101 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:37:59,101 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:02,565 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:02,565 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:06,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:09,517 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:09,517 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:09,517 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:13,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:13,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:16,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:19,828 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:19,828 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:23,242 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:23,242 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:26,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:30,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:30,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2403, 'learning_rate': 8.439024390243901e-05, 'epoch': 4.24} +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2498, 'learning_rate': 8.390243902439023e-05, 'epoch': 4.25} + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████���███████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1854, 'learning_rate': 8.341463414634146e-05, 'epoch': 4.25} + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1997, 'learning_rate': 8.292682926829268e-05, 'epoch': 4.26} + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2298, 'learning_rate': 8.24390243902439e-05, 'epoch': 4.26} + 85%|████████████████████���███████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1632, 'learning_rate': 8.195121951219513e-05, 'epoch': 4.26} +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1737, 'learning_rate': 8.146341463414633e-05, 'epoch': 4.27} +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1601, 'learning_rate': 8.097560975609755e-05, 'epoch': 4.27} +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1663, 'learning_rate': 8.048780487804878e-05, 'epoch': 4.28} +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.187, 'learning_rate': 7.999999999999999e-05, 'epoch': 4.28} +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1445, 'learning_rate': 7.951219512195121e-05, 'epoch': 4.29} +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1499, 'learning_rate': 7.902439024390244e-05, 'epoch': 4.29} +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1406, 'learning_rate': 7.853658536585364e-05, 'epoch': 4.3} +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1639, 'learning_rate': 7.804878048780487e-05, 'epoch': 4.3} +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▍ | 960/1115 [6:05:57<1:06:03, 25.57s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▍ | 960/1115 [6:05:57<1:06:03, 25.57s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1295, 'learning_rate': 7.756097560975609e-05, 'epoch': 4.3} + 86%|█████████████████████████████████████████████████████████████████▍ | 960/1115 [6:05:57<1:06:03, 25.57s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▍ | 960/1115 [6:05:57<1:06:03, 25.57s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▍ | 960/1115 [6:05:57<1:06:03, 25.57s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▍ | 960/1115 [6:05:57<1:06:03, 25.57s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▍ | 960/1115 [6:05:57<1:06:03, 25.57s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▍ | 960/1115 [6:05:57<1:06:03, 25.57s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▍ | 960/1115 [6:05:57<1:06:03, 25.57s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▍ | 960/1115 [6:05:57<1:06:03, 25.57s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▍ | 960/1115 [6:05:57<1:06:03, 25.57s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▍ | 960/1115 [6:05:57<1:06:03, 25.57s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 961/1115 [6:06:21<1:05:07, 25.38s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 961/1115 [6:06:21<1:05:07, 25.38s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1336, 'learning_rate': 7.707317073170731e-05, 'epoch': 4.31} + 86%|█████████████████████████████████████████████████████████████████��� | 961/1115 [6:06:21<1:05:07, 25.38s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 961/1115 [6:06:21<1:05:07, 25.38s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 961/1115 [6:06:21<1:05:07, 25.38s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 961/1115 [6:06:21<1:05:07, 25.38s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 961/1115 [6:06:21<1:05:07, 25.38s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 961/1115 [6:06:21<1:05:07, 25.38s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 961/1115 [6:06:21<1:05:07, 25.38s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 961/1115 [6:06:21<1:05:07, 25.38s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 961/1115 [6:06:21<1:05:07, 25.38s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 961/1115 [6:06:21<1:05:07, 25.38s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 961/1115 [6:06:21<1:05:07, 25.38s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████��███████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1371, 'learning_rate': 7.609756097560976e-05, 'epoch': 4.32} + 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1307, 'learning_rate': 7.560975609756096e-05, 'epoch': 4.32} + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1316, 'learning_rate': 7.512195121951219e-05, 'epoch': 4.33} + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1358, 'learning_rate': 7.46341463414634e-05, 'epoch': 4.33} + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|██████████████████████████████████████��██████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1215, 'learning_rate': 7.414634146341462e-05, 'epoch': 4.34} + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▋ | 968/1115 [6:09:12<59:14, 24.18s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▋ | 968/1115 [6:09:12<59:14, 24.18s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1351, 'learning_rate': 7.365853658536584e-05, 'epoch': 4.34} + 87%|███████████████████████████████████████████████████████████████████▋ | 968/1115 [6:09:12<59:14, 24.18s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▋ | 968/1115 [6:09:12<59:14, 24.18s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▋ | 968/1115 [6:09:12<59:14, 24.18s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▋ | 968/1115 [6:09:12<59:14, 24.18s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▋ | 968/1115 [6:09:12<59:14, 24.18s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▋ | 968/1115 [6:09:12<59:14, 24.18s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▋ | 968/1115 [6:09:12<59:14, 24.18s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▋ | 968/1115 [6:09:12<59:14, 24.18s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▋ | 968/1115 [6:09:12<59:14, 24.18s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▋ | 968/1115 [6:09:12<59:14, 24.18s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▊ | 969/1115 [6:09:37<59:10, 24.32s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▊ | 969/1115 [6:09:37<59:10, 24.32s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.121, 'learning_rate': 7.317073170731707e-05, 'epoch': 4.35} + 87%|███████████████████████████████████████████████████████████████████▊ | 969/1115 [6:09:37<59:10, 24.32s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▊ | 969/1115 [6:09:37<59:10, 24.32s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▊ | 969/1115 [6:09:37<59:10, 24.32s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▊ | 969/1115 [6:09:37<59:10, 24.32s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▊ | 969/1115 [6:09:37<59:10, 24.32s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:41,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:41,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:41,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:41,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:41,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.106, 'learning_rate': 7.21951219512195e-05, 'epoch': 4.35} +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1224, 'learning_rate': 7.170731707317072e-05, 'epoch': 4.36} + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1107, 'learning_rate': 7.121951219512194e-05, 'epoch': 4.36} + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████���██████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1096, 'learning_rate': 7.073170731707317e-05, 'epoch': 4.37} + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1294, 'learning_rate': 7.024390243902439e-05, 'epoch': 4.37} + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|█████���█████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1206, 'learning_rate': 6.97560975609756e-05, 'epoch': 4.38} +[WARNING|modeling_utils.py:388] 2022-03-26 01:51:15,648 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:51:15,648 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:51:19,922 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:51:19,922 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:51:19,922 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:51:25,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:51:25,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:51:25,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:51:25,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:51:25,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:51:25,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▎ | 977/1115 [6:12:46<53:59, 23.47s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▎ | 977/1115 [6:12:46<53:59, 23.47s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▎ | 977/1115 [6:12:46<53:59, 23.47s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▎ | 977/1115 [6:12:46<53:59, 23.47s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▎ | 977/1115 [6:12:46<53:59, 23.47s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▎ | 977/1115 [6:12:46<53:59, 23.47s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▎ | 977/1115 [6:12:46<53:59, 23.47s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▎ | 977/1115 [6:12:46<53:59, 23.47s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:51:51,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:51:51,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:51:51,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:51:51,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1112, 'learning_rate': 6.878048780487805e-05, 'epoch': 4.39} +[WARNING|modeling_utils.py:388] 2022-03-26 01:51:51,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:51:51,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:51:51,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:51:51,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:51:51,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:51:51,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:51:51,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:51:51,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:52:16,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:52:16,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:52:16,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1085, 'learning_rate': 6.829268292682925e-05, 'epoch': 4.39} +[WARNING|modeling_utils.py:388] 2022-03-26 01:52:16,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:52:16,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:52:16,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:52:16,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:52:16,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:52:32,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:52:32,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:52:32,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:52:32,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▌ | 980/1115 [6:13:51<49:59, 22.22s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▌ | 980/1115 [6:13:51<49:59, 22.22s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:52:43,236 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:52:43,236 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:52:43,236 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:52:49,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:52:49,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:52:49,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:52:55,758 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:52:55,758 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:52:55,758 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:52:55,758 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:53:01,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:53:01,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:53:01,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:53:08,110 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:53:08,110 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:53:11,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:53:14,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:53:14,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:53:18,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▋ | 982/1115 [6:14:30<46:16, 20.88s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▋ | 982/1115 [6:14:30<46:16, 20.88s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:53:22,224 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:53:24,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:53:26,606 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:53:26,606 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:53:30,544 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:53:32,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:53:34,755 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:53:36,863 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:53:39,060 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:53:39,060 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:53:41,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:53:43,222 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 01:53:43,222 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:53:46,672 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:53:48,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:53:50,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:53:52,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:53:54,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:53:54,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:53:56,550 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:53:58,449 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:00,329 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:02,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:03,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:05,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:07,579 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:07,579 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:11,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:12,940 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:14,651 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:16,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:18,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:19,677 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:23,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:23,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:24,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:26,196 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:29,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:30,803 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:32,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:35,273 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:35,273 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:36,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:39,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:41,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:42,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:45,269 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:46,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:46,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:49,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:51,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:53,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:54,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:54,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:57,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:54:59,139 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:01,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:03,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:03,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:05,087 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:07,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:09,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:09,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:11,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:13,754 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:15,270 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:15,270 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:16,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:19,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:19,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:22,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:22,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:26,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:26,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:30,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:33,751 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:33,751 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:37,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:37,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:40,874 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:40,874 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:40,874 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:44,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:48,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:48,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:51,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:51,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:55,195 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:55,195 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:55:58,769 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:02,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:02,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:05,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:05,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:10,346 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:10,346 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:13,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:13,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3143, 'learning_rate': 6.0975609756097554e-05, 'epoch': 4.46} +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:17,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:20,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:20,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:24,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:24,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:27,885 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:31,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:31,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:34,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:34,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:38,329 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:38,329 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:41,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:41,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:45,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:45,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:48,725 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:48,725 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:52,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:55,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:55,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:58,976 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:56:58,976 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:02,369 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1964, 'learning_rate': 5.9999999999999995e-05, 'epoch': 4.47} +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2446, 'learning_rate': 5.951219512195121e-05, 'epoch': 4.47} +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1735, 'learning_rate': 5.9024390243902435e-05, 'epoch': 4.48} +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +03/26/2022 02:08:27 - INFO - datasets.metric - Removing /home/sanchit_huggingface_co/.cache/huggingface/metrics/wer/default/default_experiment-1-0.arrow +{'eval_loss': 0.36502909660339355, 'eval_wer': 0.11207854026180088, 'eval_runtime': 567.0865, 'eval_samples_per_second': 4.659, 'eval_steps_per_second': 0.584, 'epoch': 4.48} +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...