diff --git "a/wandb/run-20220326_171130-bdf5nvyg/files/output.log" "b/wandb/run-20220326_171130-bdf5nvyg/files/output.log" --- "a/wandb/run-20220326_171130-bdf5nvyg/files/output.log" +++ "b/wandb/run-20220326_171130-bdf5nvyg/files/output.log" @@ -18463,3 +18463,6191 @@ {'eval_loss': 0.3513650596141815, 'eval_wer': 0.10093216977389925, 'eval_runtime': 571.1505, 'eval_samples_per_second': 4.626, 'eval_steps_per_second': 0.58, 'epoch': 6.73} [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0772, 'learning_rate': 0.0001267630057803468, 'epoch': 6.73} +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0731, 'learning_rate': 0.00012658959537572252, 'epoch': 6.74} +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0637, 'learning_rate': 0.00012641618497109824, 'epoch': 6.74} +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0737, 'learning_rate': 0.000126242774566474, 'epoch': 6.74} + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0885, 'learning_rate': 0.00012606936416184968, 'epoch': 6.75} + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0575, 'learning_rate': 0.00012589595375722543, 'epoch': 6.75} + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0623, 'learning_rate': 0.00012572254335260115, 'epoch': 6.76} + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0807, 'learning_rate': 0.00012554913294797687, 'epoch': 6.76} + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0606, 'learning_rate': 0.0001253757225433526, 'epoch': 6.77} + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0596, 'learning_rate': 0.00012520231213872831, 'epoch': 6.77} + Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.066, 'learning_rate': 0.00012502890173410404, 'epoch': 6.78} + 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0648, 'learning_rate': 0.00012485549132947976, 'epoch': 6.78} + 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0554, 'learning_rate': 0.00012468208092485548, 'epoch': 6.78} + 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0601, 'learning_rate': 0.0001245086705202312, 'epoch': 6.79} + 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|███████████████��██████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0526, 'learning_rate': 0.00012433526011560692, 'epoch': 6.79} + 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:00:33,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:00:33,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:00:33,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:00:39,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:00:39,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:00:39,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:00:39,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:00:39,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:00:39,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:00:39,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0653, 'learning_rate': 0.00012416184971098267, 'epoch': 6.8} +[WARNING|modeling_utils.py:388] 2022-03-27 03:00:39,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:00:39,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:00:39,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:00:39,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:00:39,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:00:39,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:05,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:05,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:05,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|███████████████████████████████████████████████���███ | 1517/2230 [9:49:39<4:51:33, 24.53s/it]g-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|███████████████████████████████████████████████████ | 1517/2230 [9:49:39<4:51:33, 24.53s/it]g-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0682, 'learning_rate': 0.00012398843930635836, 'epoch': 6.8} + 68%|███████████████████████████████████████████████████ | 1517/2230 [9:49:39<4:51:33, 24.53s/it]g-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|███████████████████████████████████████████████████ | 1517/2230 [9:49:39<4:51:33, 24.53s/it]g-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0612, 'learning_rate': 0.0001238150289017341, 'epoch': 6.81} +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0616, 'learning_rate': 0.0001236416184971098, 'epoch': 6.81} +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0609, 'learning_rate': 0.00012346820809248555, 'epoch': 6.82} +[WARNING|modeling_utils.py:388] 2022-03-27 03:02:23,435 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:02:23,435 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:02:27,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:02:27,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:02:27,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:02:27,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:02:35,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:02:35,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:02:35,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:02:35,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:02:35,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0565, 'learning_rate': 0.00012329479768786127, 'epoch': 6.82} +[WARNING|modeling_utils.py:388] 2022-03-27 03:02:35,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:02:35,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:02:35,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:02:51,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:02:51,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:02:55,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:02:55,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:02:55,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:02:55,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|███████████████████████████████████████████████████▏ | 1522/2230 [9:51:32<4:26:41, 22.60s/it]g-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|███████████████████████████████████████████████████▏ | 1522/2230 [9:51:32<4:26:41, 22.60s/it]g-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0499, 'learning_rate': 0.000123121387283237, 'epoch': 6.83} + 68%|███████████████████████████████████████████████████▏ | 1522/2230 [9:51:32<4:26:41, 22.60s/it]g-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|███████████████████████████████████████████████████▏ | 1522/2230 [9:51:32<4:26:41, 22.60s/it]g-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|███████████████████████████████████████████████████▏ | 1522/2230 [9:51:32<4:26:41, 22.60s/it]g-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|███████████████████████████████████████████████████▏ | 1522/2230 [9:51:32<4:26:41, 22.60s/it]g-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|███████████████████████████████████████████████████▏ | 1522/2230 [9:51:32<4:26:41, 22.60s/it]g-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:03:18,677 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:03:18,677 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:03:18,677 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:03:18,677 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:03:18,677 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:03:18,677 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0652, 'learning_rate': 0.0001229479768786127, 'epoch': 6.83} +[WARNING|modeling_utils.py:388] 2022-03-27 03:03:18,677 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:03:33,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:03:33,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:03:33,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:03:39,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:03:39,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:03:39,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:03:39,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:03:39,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:03:47,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:03:47,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:03:51,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:03:51,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:03:51,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:03:51,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:03:59,586 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:04:02,107 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:04:02,107 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:04:05,856 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:04:05,856 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0348, 'learning_rate': 0.00012260115606936415, 'epoch': 6.84} +[WARNING|modeling_utils.py:388] 2022-03-27 03:04:09,896 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:04:12,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:04:12,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:04:12,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:04:12,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:04:12,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:04:21,978 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:04:21,978 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:04:26,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:04:26,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0554, 'learning_rate': 0.00012242774566473987, 'epoch': 6.84} +[WARNING|modeling_utils.py:388] 2022-03-27 03:04:30,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:04:30,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:04:30,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:04:36,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:04:38,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:04:38,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:04:38,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:04:44,615 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:04:44,615 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:04:47,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:04:47,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:04:51,128 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:04:51,128 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:04:54,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:04:57,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:04:57,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:05:01,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:05:03,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:05:03,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:05:05,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:05:05,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:05:09,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:05:09,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:05:13,034 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:05:15,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:05:17,222 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:05:19,267 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 69%|███████████████████████████████████████████████████▍ | 1529/2230 [9:53:48<3:40:04, 18.84s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:05:21,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 69%|███████████████████████████████████████████████████▍ | 1529/2230 [9:53:48<3:40:04, 18.84s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:05:21,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:05:23,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:21,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:05:25,487 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:21,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:05:27,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:21,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:05:29,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:21,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:05:31,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:21,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:05:33,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:21,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:05:35,209 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:21,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 69%|███████████████████████████████████████████████████▍ | 1530/2230 [9:54:04<3:29:24, 17.95s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:05:37,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 69%|███████████████████████████████████████████████████▍ | 1530/2230 [9:54:04<3:29:24, 17.95s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:05:37,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:05:39,146 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:37,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:05:41,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:37,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:05:42,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:37,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:05:44,767 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:37,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:05:46,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:37,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:05:50,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:37,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 69%|███████████████████████████████████████████████████▍ | 1531/2230 [9:54:19<3:18:12, 17.01s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:05:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 69%|███████████████████████████████████████████████████▍ | 1531/2230 [9:54:19<3:18:12, 17.01s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:05:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:05:53,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:05:56,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:05:58,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:05:59,676 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:01,339 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:02,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 69%|███████████████████████████████████████████████████▌ | 1532/2230 [9:54:33<3:08:51, 16.23s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:06:06,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 69%|███████████████████████████████████████████████████▌ | 1532/2230 [9:54:33<3:08:51, 16.23s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:06:06,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:08,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:06,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:09,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:06,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:11,263 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:06,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:14,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:06,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:15,894 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:06,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 69%|███████████████████████████████████████████████████▌ | 1533/2230 [9:54:46<2:56:09, 15.16s/it] Setting `use_cache=False`...1] 2022-03-27 03:06:06,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 69%|███████████████████████████████████████████████████▌ | 1533/2230 [9:54:46<2:56:09, 15.16s/it] Setting `use_cache=False`...1] 2022-03-27 03:06:06,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:20,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:19,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:21,860 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:19,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:24,642 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:19,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:26,033 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:19,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:28,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:19,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:28,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:19,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 69%|███████████████████████████████████████████████████▌ | 1534/2230 [9:54:57<2:42:12, 13.98s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:06:30,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:32,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:30,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:34,000 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:30,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:36,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:30,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:38,804 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:30,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:38,804 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:30,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 69%|███████████████████████████████████████████████████▋ | 1535/2230 [9:55:07<2:27:58, 12.77s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:06:40,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:42,310 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:40,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:44,463 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:40,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:46,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:40,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:46,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:40,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 69%|███████████████████████████████████████████████████▋ | 1536/2230 [9:55:16<2:13:29, 11.54s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:06:48,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:50,623 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:48,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:53,382 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:48,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:55,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:48,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:55,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:48,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:57,073 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:56,200 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:06:58,741 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:56,200 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:07:01,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:56,200 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 69%|███████████████████████████████████████████████████▋ | 1538/2230 [9:55:31<1:49:01, 9.45s/it] Setting `use_cache=False`...1] 2022-03-27 03:06:56,200 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 69%|███████████████████████████████████████████████████▋ | 1538/2230 [9:55:31<1:49:01, 9.45s/it] Setting `use_cache=False`...1] 2022-03-27 03:06:56,200 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 69%|███████████████████████████████████████████████████▋ | 1538/2230 [9:55:31<1:49:01, 9.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 69%|███████████████████████████████████████████████████▋ | 1538/2230 [9:55:31<1:49:01, 9.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:07:08,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:07:08,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:07:12,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:07:15,636 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:07:15,636 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:07:19,217 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:07:19,217 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:07:22,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:07:22,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:07:26,425 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:07:29,943 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:07:29,943 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:07:29,943 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 69%|███████████████████████████████████████████████████▊ | 1539/2230 [9:56:00<2:56:45, 15.35s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 69%|███████████████████████████████████████████████████▊ | 1539/2230 [9:56:00<2:56:45, 15.35s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:07:37,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:07:37,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:07:40,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:07:44,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:07:44,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:07:47,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:07:47,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:07:51,004 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:07:54,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:07:54,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:07:57,902 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 69%|███████████████████████████████████████████████████▊ | 1540/2230 [9:56:28<3:40:22, 19.16s/it] Setting `use_cache=False`...1] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 69%|███████████████████████████████████████████████████▊ | 1540/2230 [9:56:28<3:40:22, 19.16s/it] Setting `use_cache=False`...1] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 69%|███████████████████████████████████████████████████▊ | 1540/2230 [9:56:28<3:40:22, 19.16s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:05,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:05,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:08,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:08,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:11,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:11,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:15,218 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:18,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:18,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:22,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:22,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:25,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:25,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 69%|███████████████████████████████████████████████████▊ | 1541/2230 [9:56:55<4:08:24, 21.63s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 69%|███████████████████████████████████████████████████▊ | 1541/2230 [9:56:55<4:08:24, 21.63s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:32,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:32,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:35,828 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:39,182 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:39,182 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:42,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:45,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:45,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0729, 'learning_rate': 0.00011965317919075144, 'epoch': 6.91} +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0855, 'learning_rate': 0.00011947976878612715, 'epoch': 6.92} +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0607, 'learning_rate': 0.00011930635838150289, 'epoch': 6.92} +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0633, 'learning_rate': 0.0001191329479768786, 'epoch': 6.93} +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0864, 'learning_rate': 0.00011895953757225433, 'epoch': 6.93} +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0667, 'learning_rate': 0.00011878612716763005, 'epoch': 6.94} +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0621, 'learning_rate': 0.00011861271676300578, 'epoch': 6.94} +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0582, 'learning_rate': 0.00011843930635838149, 'epoch': 6.95} +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0563, 'learning_rate': 0.00011826589595375722, 'epoch': 6.95} +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|███████████████████████████████████████████████████▍ | 1551/2230 [10:01:11<4:38:01, 24.57s/it] Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|███████████████████████████████████████████████████▍ | 1551/2230 [10:01:11<4:38:01, 24.57s/it] Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0595, 'learning_rate': 0.00011809248554913293, 'epoch': 6.96} +[WARNING|modeling_utils.py:388] 2022-03-27 03:12:47,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:12:47,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:12:47,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:12:47,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:12:47,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:12:47,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:12:47,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:12:47,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:12:47,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0622, 'learning_rate': 0.00011791907514450866, 'epoch': 6.96} + 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|███████████████████��███████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0489, 'learning_rate': 0.00011774566473988439, 'epoch': 6.96} + 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:13:40,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:13:40,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:13:44,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:13:44,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:13:44,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:13:44,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:13:44,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:13:52,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:13:52,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:13:52,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:13:52,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:13:52,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:14:03,230 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:14:03,230 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:14:03,230 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:14:09,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:14:09,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:14:09,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0699, 'learning_rate': 0.00011739884393063583, 'epoch': 6.97} +[WARNING|modeling_utils.py:388] 2022-03-27 03:14:09,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:14:17,952 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:14:17,952 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:14:17,952 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:14:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:14:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:14:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:14:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:14:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:14:31,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:14:33,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:14:33,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:14:37,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:14:40,098 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:14:40,098 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:14:43,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:14:45,793 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:14:47,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:14:50,044 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:14:50,044 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:14:52,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:14:53,951 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:14:55,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:14:57,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:14:59,603 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:01,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:03,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:03,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:06,719 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:08,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:10,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:11,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:14,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:15,953 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:15,953 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:18,715 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:19,985 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:22,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:24,572 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:26,689 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:26,689 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:28,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:30,371 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:32,775 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:32,775 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:35,292 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:35,292 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:38,954 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:38,954 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:42,658 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:42,658 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:46,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:46,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:49,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:53,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:53,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:57,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:15:57,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:00,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:00,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.082, 'learning_rate': 0.00011618497109826587, 'epoch': 7.0} +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:04,310 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:07,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:07,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:11,366 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:11,366 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:14,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:14,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0802, 'learning_rate': 0.0001160115606936416, 'epoch': 7.01} +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0734, 'learning_rate': 0.00011583815028901733, 'epoch': 7.01} +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.075, 'learning_rate': 0.00011566473988439306, 'epoch': 7.02} +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.074, 'learning_rate': 0.00011549132947976877, 'epoch': 7.02} +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0624, 'learning_rate': 0.0001153179190751445, 'epoch': 7.03} +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0662, 'learning_rate': 0.00011514450867052021, 'epoch': 7.03} +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0631, 'learning_rate': 0.00011497109826589594, 'epoch': 7.04} +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0493, 'learning_rate': 0.00011479768786127166, 'epoch': 7.04} +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0642, 'learning_rate': 0.00011462427745664738, 'epoch': 7.04} +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0557, 'learning_rate': 0.0001144508670520231, 'epoch': 7.05} +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0515, 'learning_rate': 0.00011427745664739884, 'epoch': 7.05} +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▏ | 1574/2230 [10:09:57<4:49:08, 26.45s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▏ | 1574/2230 [10:09:57<4:49:08, 26.45s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0615, 'learning_rate': 0.00011410404624277455, 'epoch': 7.06} + 71%|████████████████████████████████████████████████████▏ | 1574/2230 [10:09:57<4:49:08, 26.45s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▏ | 1574/2230 [10:09:57<4:49:08, 26.45s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▏ | 1574/2230 [10:09:57<4:49:08, 26.45s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▏ | 1574/2230 [10:09:57<4:49:08, 26.45s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▏ | 1574/2230 [10:09:57<4:49:08, 26.45s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▏ | 1574/2230 [10:09:57<4:49:08, 26.45s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▏ | 1574/2230 [10:09:57<4:49:08, 26.45s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▏ | 1574/2230 [10:09:57<4:49:08, 26.45s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0486, 'learning_rate': 0.00011393063583815028, 'epoch': 7.06} +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0503, 'learning_rate': 0.00011375722543352599, 'epoch': 7.07} +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0713, 'learning_rate': 0.00011358381502890172, 'epoch': 7.07} +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0545, 'learning_rate': 0.00011341040462427744, 'epoch': 7.08} +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0459, 'learning_rate': 0.00011323699421965318, 'epoch': 7.08} +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0357, 'learning_rate': 0.00011306358381502888, 'epoch': 7.09} +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0504, 'learning_rate': 0.00011289017341040462, 'epoch': 7.09} +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0427, 'learning_rate': 0.00011271676300578033, 'epoch': 7.09} +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0584, 'learning_rate': 0.00011254335260115606, 'epoch': 7.1} +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0566, 'learning_rate': 0.00011236994219653178, 'epoch': 7.1} +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0452, 'learning_rate': 0.0001121965317919075, 'epoch': 7.11} +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▋ | 1586/2230 [10:15:00<4:25:17, 24.72s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▋ | 1586/2230 [10:15:00<4:25:17, 24.72s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0471, 'learning_rate': 0.00011184971098265896, 'epoch': 7.12} +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0488, 'learning_rate': 0.00011167630057803466, 'epoch': 7.12} +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0331, 'learning_rate': 0.0001115028901734104, 'epoch': 7.13} +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0503, 'learning_rate': 0.00011132947976878612, 'epoch': 7.13} +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:28:26,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:28:26,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:28:26,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:28:26,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▊ | 1591/2230 [10:16:59<4:13:09, 23.77s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▊ | 1591/2230 [10:16:59<4:13:09, 23.77s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▊ | 1591/2230 [10:16:59<4:13:09, 23.77s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:28:38,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:28:38,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:28:38,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:28:38,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:28:38,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:28:38,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:28:38,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:28:38,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0457, 'learning_rate': 0.00011098265895953756, 'epoch': 7.14} + 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0453, 'learning_rate': 0.0001108092485549133, 'epoch': 7.14} + 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|███████████████████████████████████████████��████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.048, 'learning_rate': 0.000110635838150289, 'epoch': 7.15} + 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:29:47,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:29:47,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:29:47,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:29:47,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:29:47,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:29:47,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:29:47,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:29:47,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:29:47,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.043, 'learning_rate': 0.00011046242774566474, 'epoch': 7.15} +[WARNING|modeling_bart.py:1051] 2022-03-27 03:30:06,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:30:06,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:30:06,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:30:06,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:30:06,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:30:16,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:30:16,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:30:16,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:30:16,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|████████████████████████████████████████████████████▉ | 1596/2230 [10:18:52<3:57:22, 22.46s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|████████████████████████████████████████████████████▉ | 1596/2230 [10:18:52<3:57:22, 22.46s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0318, 'learning_rate': 0.00011028901734104044, 'epoch': 7.16} + 72%|████████████████████████████████████████████████████▉ | 1596/2230 [10:18:52<3:57:22, 22.46s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|████████████████████████████████████████████████████▉ | 1596/2230 [10:18:52<3:57:22, 22.46s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|████████████████████████████████████████████████████▉ | 1596/2230 [10:18:52<3:57:22, 22.46s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|████████████████████████████████████████████████████▉ | 1596/2230 [10:18:52<3:57:22, 22.46s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|████████████████████████████████████████████████████▉ | 1596/2230 [10:18:52<3:57:22, 22.46s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:30:38,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:30:38,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:30:38,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:30:38,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:30:38,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:30:38,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0444, 'learning_rate': 0.00011011560693641618, 'epoch': 7.16} +[WARNING|modeling_utils.py:388] 2022-03-27 03:30:38,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:30:53,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:30:53,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:30:53,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:30:53,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:30:53,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:30:53,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:31:05,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:31:05,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:31:05,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0316, 'learning_rate': 0.0001099421965317919, 'epoch': 7.17} +[WARNING|modeling_bart.py:1051] 2022-03-27 03:31:05,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:31:13,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:31:13,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:31:13,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:31:19,842 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:31:19,842 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:31:19,842 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:31:26,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:31:26,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0392, 'learning_rate': 0.00010976878612716762, 'epoch': 7.17} +[WARNING|modeling_utils.py:388] 2022-03-27 03:31:26,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:31:26,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:31:26,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:31:35,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:31:35,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:31:40,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:31:40,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:31:44,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:31:44,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:31:44,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:31:48,289 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:31:50,644 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:31:50,644 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:31:54,554 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:31:54,554 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:31:58,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:31:58,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:32:02,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:32:04,909 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:32:04,909 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0407, 'learning_rate': 0.00010942196531791907, 'epoch': 7.18} +[WARNING|modeling_bart.py:1051] 2022-03-27 03:32:09,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:32:09,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:32:12,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:32:14,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:32:17,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:32:19,212 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:32:21,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:32:21,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:32:23,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:32:25,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:32:25,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:32:29,252 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:32:31,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:32:33,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:32:35,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:32:37,193 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:32:37,193 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|█████████████████████████████████████████████████████▏ | 1603/2230 [10:21:06<3:12:30, 18.42s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:32:39,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:32:41,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:32:39,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:32:43,035 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:32:39,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:32:44,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:32:39,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:32:46,748 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:32:39,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:32:48,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:32:39,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:32:50,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:32:39,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|█████████████████████████████████████████████████████▏ | 1604/2230 [10:21:21<3:01:24, 17.39s/it] Setting `use_cache=False`...1] 2022-03-27 03:32:39,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|█████████████████████████████████████████████████████▏ | 1604/2230 [10:21:21<3:01:24, 17.39s/it] Setting `use_cache=False`...1] 2022-03-27 03:32:39,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|█████████████████████████████████████████████████████▏ | 1604/2230 [10:21:21<3:01:24, 17.39s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:32:54,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:32:57,724 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:32:54,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:32:59,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:32:54,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:33:01,177 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:32:54,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:33:02,878 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:32:54,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:33:04,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:32:54,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|█████████████████████████████████████████████████████▎ | 1605/2230 [10:21:35<2:50:15, 16.34s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:33:08,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|█████████████████████████████████████████████████████▎ | 1605/2230 [10:21:35<2:50:15, 16.34s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:33:08,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:33:09,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:08,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:33:11,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:08,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:33:12,826 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:08,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:33:15,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:08,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:33:17,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:08,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|█████████████████████████████████████████████████████▎ | 1606/2230 [10:21:47<2:37:52, 15.18s/it] Setting `use_cache=False`...1] 2022-03-27 03:33:08,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|█████████████████████████████████████████████████████▎ | 1606/2230 [10:21:47<2:37:52, 15.18s/it] Setting `use_cache=False`...1] 2022-03-27 03:33:08,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:33:21,805 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:20,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:33:24,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:20,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:33:25,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:20,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:33:28,087 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:20,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:33:29,400 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:20,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|█████████████████████████████████████████████████████▎ | 1607/2230 [10:21:59<2:26:56, 14.15s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:33:32,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|██████████████████████████████████████████████████��██▎ | 1607/2230 [10:21:59<2:26:56, 14.15s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:33:32,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:33:33,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:32,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:33:35,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:32,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:33:37,158 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:32,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:33:40,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:32,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|█████████████████████████████████████████████████████▎ | 1608/2230 [10:22:09<2:13:38, 12.89s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:33:42,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|█████████████████████████████████████████████████████▎ | 1608/2230 [10:22:09<2:13:38, 12.89s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:33:42,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:33:44,286 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:42,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:33:45,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:42,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:33:47,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:42,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|█████████████████████████████████████████████████████▍ | 1609/2230 [10:22:18<2:00:25, 11.64s/it] Setting `use_cache=False`...1] 2022-03-27 03:33:42,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|█████████████████████████████████████████████████████▍ | 1609/2230 [10:22:18<2:00:25, 11.64s/it] Setting `use_cache=False`...1] 2022-03-27 03:33:42,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:33:51,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:50,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:33:54,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:50,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:33:56,166 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:50,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|█████████████████████████████████████████████████████▍ | 1610/2230 [10:22:25<1:47:07, 10.37s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:33:58,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|█████████████████████████████████████████████████████▍ | 1610/2230 [10:22:25<1:47:07, 10.37s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:33:58,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:34:00,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:58,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:34:02,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:58,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|█████████████████████████████████████████████████████▍ | 1611/2230 [10:22:32<1:34:49, 9.19s/it] Setting `use_cache=False`...1] 2022-03-27 03:33:58,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|█████████████████████████████████████████████████████▍ | 1611/2230 [10:22:32<1:34:49, 9.19s/it] Setting `use_cache=False`...1] 2022-03-27 03:33:58,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|█████████████████████████████████████████████████████▍ | 1611/2230 [10:22:32<1:34:49, 9.19s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|█████████████████████████████████████████████████████▍ | 1611/2230 [10:22:32<1:34:49, 9.19s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:34:09,207 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:34:12,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:34:12,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:34:12,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:34:16,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:34:20,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:34:20,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:34:23,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:34:23,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:34:27,546 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:34:31,134 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:34:31,134 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|█████████████████████████████████████████████████████▍ | 1612/2230 [10:23:01<2:37:04, 15.25s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|█████████████████████████████████████████████████████▍ | 1612/2230 [10:23:01<2:37:04, 15.25s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0953, 'learning_rate': 0.00010751445086705201, 'epoch': 7.23} +[WARNING|modeling_bart.py:1051] 2022-03-27 03:34:38,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:34:38,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:34:41,937 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:34:41,937 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:34:45,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:34:50,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:34:50,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:34:53,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:34:57,205 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:34:57,205 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:00,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|█████████████████████████████████████████████████████▌ | 1613/2230 [10:23:31<3:20:57, 19.54s/it] Setting `use_cache=False`...1] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|█████████████████████████████████████████████████████▌ | 1613/2230 [10:23:31<3:20:57, 19.54s/it] Setting `use_cache=False`...1] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|█████████████████████████████████████████████████████▌ | 1613/2230 [10:23:31<3:20:57, 19.54s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|█████████████████████████████████████████████████████▌ | 1613/2230 [10:23:31<3:20:57, 19.54s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:07,890 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:11,348 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:11,348 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:14,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:14,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:18,422 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:21,880 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:21,880 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:25,369 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:25,369 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:28,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:28,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|█████████████████████████████████████████████████████▌ | 1614/2230 [10:23:59<3:47:08, 22.12s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|█████████████████████████████████████████████████████▌ | 1614/2230 [10:23:59<3:47:08, 22.12s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:35,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:35,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:39,321 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:42,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:42,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:46,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:46,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:49,687 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0735, 'learning_rate': 0.00010699421965317919, 'epoch': 7.24} +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0702, 'learning_rate': 0.0001068208092485549, 'epoch': 7.25} +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0634, 'learning_rate': 0.00010664739884393063, 'epoch': 7.25} +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0637, 'learning_rate': 0.00010647398843930635, 'epoch': 7.26} +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0591, 'learning_rate': 0.00010630057803468207, 'epoch': 7.26} +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0628, 'learning_rate': 0.00010612716763005779, 'epoch': 7.26} +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0501, 'learning_rate': 0.00010595375722543353, 'epoch': 7.27} +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0643, 'learning_rate': 0.00010578034682080923, 'epoch': 7.27} +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0555, 'learning_rate': 0.00010560693641618497, 'epoch': 7.28} +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0606, 'learning_rate': 0.00010543352601156068, 'epoch': 7.28} + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0447, 'learning_rate': 0.00010526011560693641, 'epoch': 7.29} + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0522, 'learning_rate': 0.00010508670520231213, 'epoch': 7.29} + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0526, 'learning_rate': 0.00010491329479768786, 'epoch': 7.3} + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████���██████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.046, 'learning_rate': 0.00010473988439306357, 'epoch': 7.3} + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0683, 'learning_rate': 0.0001045664739884393, 'epoch': 7.3} + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0516, 'learning_rate': 0.00010439306358381501, 'epoch': 7.31} + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|████████████████████████���█████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0431, 'learning_rate': 0.00010421965317919075, 'epoch': 7.31} + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0407, 'learning_rate': 0.00010404624277456647, 'epoch': 7.32} + Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0441, 'learning_rate': 0.00010369942196531791, 'epoch': 7.33} + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████���██████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0361, 'learning_rate': 0.00010352601156069364, 'epoch': 7.33} + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:44:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:44:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:44:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:44:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:44:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:44:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:45:04,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:45:04,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:45:04,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0417, 'learning_rate': 0.00010335260115606935, 'epoch': 7.34} +[WARNING|modeling_utils.py:388] 2022-03-27 03:45:04,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:45:04,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:45:04,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:45:04,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:45:04,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:45:04,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:45:04,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:45:04,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:45:26,951 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:45:26,951 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:45:26,951 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0548, 'learning_rate': 0.00010317919075144509, 'epoch': 7.34} +[WARNING|modeling_utils.py:388] 2022-03-27 03:45:26,951 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:45:26,951 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:45:26,951 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:45:26,951 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:45:26,951 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:45:43,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:45:43,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:45:43,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:45:43,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:45:43,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:45:43,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0421, 'learning_rate': 0.00010283236994219653, 'epoch': 7.35} + 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:46:32,803 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:46:32,803 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:46:32,803 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:46:32,803 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|██████████████████████████████████████████████████████▍ | 1640/2230 [10:35:08<3:55:35, 23.96s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|██████████████████████████████████████████████████████▍ | 1640/2230 [10:35:08<3:55:35, 23.96s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0444, 'learning_rate': 0.00010265895953757225, 'epoch': 7.35} + 74%|██████████████████████████████████████████████████████▍ | 1640/2230 [10:35:08<3:55:35, 23.96s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|██████████████████████████████████████████████████████▍ | 1640/2230 [10:35:08<3:55:35, 23.96s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|██████████████████████████████████████████████████████▍ | 1640/2230 [10:35:08<3:55:35, 23.96s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|██████████████████████████████████████████████████████▍ | 1640/2230 [10:35:08<3:55:35, 23.96s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|██████████████████████████████████████████████████████▍ | 1640/2230 [10:35:08<3:55:35, 23.96s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|██████████████████████████████████████████████████████▍ | 1640/2230 [10:35:08<3:55:35, 23.96s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|██████████████████████████████████████████████████████▍ | 1640/2230 [10:35:08<3:55:35, 23.96s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|██████████████████████████████████████████████████████▍ | 1640/2230 [10:35:08<3:55:35, 23.96s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|██████████████████████████████████████████████████████▍ | 1640/2230 [10:35:08<3:55:35, 23.96s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + [WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + [WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.037, 'learning_rate': 0.00010248554913294798, 'epoch': 7.36} + [WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + [WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + [WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + [WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + [WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + [WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + [WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + [WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + [WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|██████████████████████████████████████████████████████▍ | 1642/2230 [10:35:54<3:49:51, 23.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|██████████████████████████████████████████████████████▍ | 1642/2230 [10:35:54<3:49:51, 23.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.046, 'learning_rate': 0.00010231213872832369, 'epoch': 7.36} + 74%|██████████████████████████████████████████████████████▍ | 1642/2230 [10:35:54<3:49:51, 23.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|██████████████████████████████████████████████████████▍ | 1642/2230 [10:35:54<3:49:51, 23.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|██████████████████████████████████████████████████████▍ | 1642/2230 [10:35:54<3:49:51, 23.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|██████████████████████████████████████████████████████▍ | 1642/2230 [10:35:54<3:49:51, 23.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|██████████████████████████████████████████████████████▍ | 1642/2230 [10:35:54<3:49:51, 23.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|██████████████████████████████████████████████████████▍ | 1642/2230 [10:35:54<3:49:51, 23.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|██████████████████████████████████████████████████████▍ | 1642/2230 [10:35:54<3:49:51, 23.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|██████████████████████████████████████████████████████▍ | 1642/2230 [10:35:54<3:49:51, 23.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|██████████████████████████████████████████████████████▍ | 1642/2230 [10:35:54<3:49:51, 23.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:47:48,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:47:48,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0377, 'learning_rate': 0.00010213872832369942, 'epoch': 7.37} +[WARNING|modeling_utils.py:388] 2022-03-27 03:47:48,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:47:48,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:47:48,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:47:48,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:47:48,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:47:48,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:47:48,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:47:48,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:47:48,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:47:48,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:47:48,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.043, 'learning_rate': 0.00010196531791907513, 'epoch': 7.37} +[WARNING|modeling_utils.py:388] 2022-03-27 03:48:15,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:48:15,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:48:19,139 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:48:19,139 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:48:23,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:48:23,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:48:23,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:48:23,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:48:23,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:48:23,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:48:23,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0417, 'learning_rate': 0.00010179190751445086, 'epoch': 7.38} +[WARNING|modeling_utils.py:388] 2022-03-27 03:48:23,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:48:39,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:48:39,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:48:39,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:48:39,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:48:39,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:48:39,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:48:39,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:48:39,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:48:39,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|██████████████████████████████████████████████████████▌ | 1646/2230 [10:37:23<3:37:31, 22.35s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:48:57,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:48:57,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:48:57,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:48:57,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:48:57,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:48:57,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:49:10,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:49:10,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:49:10,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:49:16,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:49:16,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0432, 'learning_rate': 0.0001014450867052023, 'epoch': 7.39} +[WARNING|modeling_utils.py:388] 2022-03-27 03:49:16,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:49:16,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:49:16,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:49:16,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:49:28,676 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:49:28,676 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:49:28,676 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:49:28,676 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:49:36,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:49:36,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0434, 'learning_rate': 0.00010127167630057803, 'epoch': 7.39} +[WARNING|modeling_utils.py:388] 2022-03-27 03:49:36,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:49:36,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:49:44,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:49:44,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:49:44,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:49:50,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:49:50,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:49:55,030 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|██████████████████████████████████████████████████████▋ | 1649/2230 [10:38:24<3:23:55, 21.06s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|██████████████████████████████████████████████████████▋ | 1649/2230 [10:38:24<3:23:55, 21.06s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0352, 'learning_rate': 0.00010109826589595376, 'epoch': 7.39} +[WARNING|modeling_bart.py:1051] 2022-03-27 03:50:01,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:50:01,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:50:05,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:50:05,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:50:05,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:50:11,120 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:50:13,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:50:13,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:50:13,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:50:13,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0402, 'learning_rate': 0.00010092485549132947, 'epoch': 7.4} +[WARNING|modeling_utils.py:388] 2022-03-27 03:50:21,695 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:50:21,695 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:50:25,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:50:25,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:50:25,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:50:31,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:50:31,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:50:35,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:50:35,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0381, 'learning_rate': 0.0001007514450867052, 'epoch': 7.4} +[WARNING|modeling_bart.py:1051] 2022-03-27 03:50:39,609 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:50:41,791 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:50:43,935 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:50:46,108 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:50:48,285 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:50:50,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:50:52,501 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:50:52,501 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:50:54,735 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:50:56,822 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:50:58,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:51:00,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 03:51:00,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:04,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:06,347 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:08,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:08,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:10,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:12,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:14,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:16,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:18,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:19,982 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:21,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:23,693 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:23,693 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:25,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:27,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:30,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:32,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:34,310 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:35,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:35,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:37,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:41,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:42,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:44,195 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:47,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:48,775 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:50,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:50,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:53,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:55,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:56,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:51:59,498 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:00,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:03,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:03,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:04,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:07,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:09,583 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:10,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:10,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:13,100 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:15,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:17,425 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:19,520 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:19,520 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:20,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:23,589 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:25,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:27,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:27,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:29,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:31,718 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:33,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:34,754 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:34,754 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:37,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:37,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:41,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:41,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:44,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:44,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:48,358 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:48,358 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:51,941 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:55,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:55,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:59,139 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:52:59,139 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:02,717 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:02,717 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:06,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:06,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:09,962 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:09,962 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:13,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:13,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:17,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:17,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:21,596 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:21,596 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:25,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:25,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:28,600 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:32,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:32,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0789, 'learning_rate': 9.867052023121385e-05, 'epoch': 7.46} +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:35,801 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:35,801 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:39,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:39,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:42,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:46,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:46,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:49,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:49,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:53,238 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:56,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:53:56,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:00,259 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:00,259 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0637, 'learning_rate': 9.849710982658958e-05, 'epoch': 7.46} +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:03,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:07,215 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:07,215 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:10,679 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:10,679 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:14,092 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0571, 'learning_rate': 9.83236994219653e-05, 'epoch': 7.47} +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0686, 'learning_rate': 9.815028901734104e-05, 'epoch': 7.47} +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0591, 'learning_rate': 9.797687861271675e-05, 'epoch': 7.48} +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0504, 'learning_rate': 9.780346820809248e-05, 'epoch': 7.48} +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0571, 'learning_rate': 9.763005780346819e-05, 'epoch': 7.48} +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.056, 'learning_rate': 9.745664739884392e-05, 'epoch': 7.49} +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0677, 'learning_rate': 9.728323699421964e-05, 'epoch': 7.49} +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0481, 'learning_rate': 9.710982658959536e-05, 'epoch': 7.5} +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0565, 'learning_rate': 9.693641618497108e-05, 'epoch': 7.5} + 75%|██████████████████████████████████████████████���████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.052, 'learning_rate': 9.676300578034682e-05, 'epoch': 7.51} + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0489, 'learning_rate': 9.658959537572253e-05, 'epoch': 7.51} + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0433, 'learning_rate': 9.641618497109826e-05, 'epoch': 7.52} + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0491, 'learning_rate': 9.624277456647398e-05, 'epoch': 7.52} + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████��█████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0445, 'learning_rate': 9.60693641618497e-05, 'epoch': 7.52} + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|████████████████████████████████��██████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0593, 'learning_rate': 9.589595375722542e-05, 'epoch': 7.53} + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████��� | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0491, 'learning_rate': 9.572254335260116e-05, 'epoch': 7.53} + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0614, 'learning_rate': 9.554913294797686e-05, 'epoch': 7.54} + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0562, 'learning_rate': 9.53757225433526e-05, 'epoch': 7.54} + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0352, 'learning_rate': 9.52023121387283e-05, 'epoch': 7.55} + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0484, 'learning_rate': 9.502890173410404e-05, 'epoch': 7.55} + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████��█████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0393, 'learning_rate': 9.485549132947976e-05, 'epoch': 7.56} + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0397, 'learning_rate': 9.468208092485548e-05, 'epoch': 7.56} + 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0423, 'learning_rate': 9.45086705202312e-05, 'epoch': 7.57} + 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0478, 'learning_rate': 9.433526011560693e-05, 'epoch': 7.57} +[WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0464, 'learning_rate': 9.416184971098264e-05, 'epoch': 7.57} + 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|███████████████████████████████████████████████████��████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0414, 'learning_rate': 9.398843930635838e-05, 'epoch': 7.58} + 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.051, 'learning_rate': 9.38150289017341e-05, 'epoch': 7.58} + 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:05:37,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:05:37,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:05:37,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:05:37,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:05:37,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:05:37,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:05:37,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:05:37,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:05:37,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:05:37,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0405, 'learning_rate': 9.364161849710982e-05, 'epoch': 7.59} +[WARNING|modeling_utils.py:388] 2022-03-27 04:05:37,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:05:59,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:05:59,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:05:59,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:05:59,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:05:59,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:05:59,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:05:59,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:05:59,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:05:59,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0464, 'learning_rate': 9.346820809248554e-05, 'epoch': 7.59} +[WARNING|modeling_utils.py:388] 2022-03-27 04:05:59,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:05:59,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:05:59,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:05:59,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:06:26,491 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:06:26,491 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:06:30,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:06:30,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:06:30,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:06:30,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████▏ | 1694/2230 [10:55:06<3:26:16, 23.09s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████▏ | 1694/2230 [10:55:06<3:26:16, 23.09s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0312, 'learning_rate': 9.329479768786127e-05, 'epoch': 7.6} + 76%|█████████████████████████████████████████��██████████████▏ | 1694/2230 [10:55:06<3:26:16, 23.09s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████▏ | 1694/2230 [10:55:06<3:26:16, 23.09s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████▏ | 1694/2230 [10:55:06<3:26:16, 23.09s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████▏ | 1694/2230 [10:55:06<3:26:16, 23.09s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0429, 'learning_rate': 9.312138728323698e-05, 'epoch': 7.6} +[WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:07:21,339 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:07:21,339 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0376, 'learning_rate': 9.294797687861271e-05, 'epoch': 7.61} +[WARNING|modeling_utils.py:388] 2022-03-27 04:07:25,380 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:07:25,380 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:07:25,380 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:07:25,380 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:07:25,380 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:07:25,380 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:07:37,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:07:37,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:07:37,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:07:37,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0374, 'learning_rate': 9.277456647398842e-05, 'epoch': 7.61} +[WARNING|modeling_bart.py:1051] 2022-03-27 04:07:37,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:07:37,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:07:49,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:07:49,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:07:49,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:07:49,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:07:49,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:07:49,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:07:49,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:07:49,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0412, 'learning_rate': 9.260115606936415e-05, 'epoch': 7.61} +[WARNING|modeling_bart.py:1051] 2022-03-27 04:08:06,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:08:06,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:08:10,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:08:10,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:08:10,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:08:10,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:08:18,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:08:18,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:08:22,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:08:22,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.029, 'learning_rate': 9.242774566473988e-05, 'epoch': 7.62} +[WARNING|modeling_utils.py:388] 2022-03-27 04:08:22,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:08:28,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:08:28,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:08:28,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:08:34,593 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:08:34,593 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:08:38,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:08:38,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:08:38,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:08:38,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:08:44,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:08:44,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:08:49,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:08:49,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:08:52,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:08:55,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:08:55,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:08:59,316 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████▍ | 1701/2230 [10:57:28<2:56:01, 19.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|████████████████████████████████████████████████████████▍ | 1701/2230 [10:57:28<2:56:01, 19.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:09:03,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:09:05,286 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:09:07,435 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:09:09,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:09:11,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:09:13,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:09:13,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:09:17,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:09:17,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:09:19,831 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:09:21,854 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:09:23,859 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:09:25,859 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:09:27,860 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:09:29,816 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:09:31,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:09:33,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:09:33,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:09:35,656 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:09:37,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:09:39,427 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:09:41,314 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:09:43,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:09:44,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:09:48,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:09:48,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:09:50,439 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:09:52,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:09:53,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:09:55,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:09:57,356 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:09:59,042 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:02,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:02,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:04,062 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:05,678 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:07,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:10,391 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:11,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:13,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:13,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:16,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:18,044 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:20,144 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:21,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:24,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:26,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:26,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:28,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:30,894 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:32,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:34,514 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:36,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:36,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:39,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:41,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:42,274 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:44,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:44,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:46,447 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:49,258 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:51,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:52,882 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:52,882 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:54,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:57,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:59,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:10:59,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0469, 'learning_rate': 9.034682080924854e-05, 'epoch': 7.67} +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:02,790 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:02,790 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:06,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:06,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:10,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:13,854 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:13,854 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:17,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:17,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:21,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:21,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:24,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:28,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:28,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:28,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:31,919 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:31,919 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:35,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:35,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:39,058 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:42,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:42,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:42,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:47,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:50,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:50,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:54,047 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:54,047 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:54,047 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:11:57,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:01,201 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:01,201 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:04,678 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:04,678 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:08,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:08,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:11,610 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:15,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:15,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:18,599 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:18,599 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:22,080 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:25,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:25,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0592, 'learning_rate': 8.982658959537573e-05, 'epoch': 7.69} +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:29,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:29,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:32,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:35,863 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:35,863 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:39,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:39,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:42,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0604, 'learning_rate': 8.965317919075143e-05, 'epoch': 7.69} +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0669, 'learning_rate': 8.947976878612717e-05, 'epoch': 7.7} + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0608, 'learning_rate': 8.930635838150287e-05, 'epoch': 7.7} + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████���██████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0554, 'learning_rate': 8.913294797687861e-05, 'epoch': 7.7} + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|███████████████████████████████��████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1719/2230 [11:03:10<3:45:38, 26.49s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1719/2230 [11:03:10<3:45:38, 26.49s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0624, 'learning_rate': 8.895953757225433e-05, 'epoch': 7.71} + 77%|█████████████████████████████████████████████████████████ | 1719/2230 [11:03:10<3:45:38, 26.49s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1719/2230 [11:03:10<3:45:38, 26.49s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1719/2230 [11:03:10<3:45:38, 26.49s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1719/2230 [11:03:10<3:45:38, 26.49s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1719/2230 [11:03:10<3:45:38, 26.49s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1719/2230 [11:03:10<3:45:38, 26.49s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1719/2230 [11:03:10<3:45:38, 26.49s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████��███████████ | 1719/2230 [11:03:10<3:45:38, 26.49s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1719/2230 [11:03:10<3:45:38, 26.49s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1719/2230 [11:03:10<3:45:38, 26.49s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1719/2230 [11:03:10<3:45:38, 26.49s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0649, 'learning_rate': 8.878612716763005e-05, 'epoch': 7.71} + 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.057, 'learning_rate': 8.861271676300577e-05, 'epoch': 7.72} + 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0445, 'learning_rate': 8.84393063583815e-05, 'epoch': 7.72} + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0504, 'learning_rate': 8.826589595375721e-05, 'epoch': 7.73} + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0529, 'learning_rate': 8.809248554913295e-05, 'epoch': 7.73} + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0628, 'learning_rate': 8.791907514450865e-05, 'epoch': 7.74} + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.047, 'learning_rate': 8.774566473988439e-05, 'epoch': 7.74} + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0438, 'learning_rate': 8.757225433526011e-05, 'epoch': 7.74} + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0491, 'learning_rate': 8.739884393063584e-05, 'epoch': 7.75} + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0572, 'learning_rate': 8.722543352601155e-05, 'epoch': 7.75} + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0363, 'learning_rate': 8.705202312138728e-05, 'epoch': 7.76} + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0537, 'learning_rate': 8.687861271676299e-05, 'epoch': 7.76} + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0313, 'learning_rate': 8.670520231213873e-05, 'epoch': 7.77} + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0508, 'learning_rate': 8.653179190751445e-05, 'epoch': 7.77} + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.043, 'learning_rate': 8.635838150289017e-05, 'epoch': 7.78} + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0354, 'learning_rate': 8.618497109826589e-05, 'epoch': 7.78} + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0435, 'learning_rate': 8.601156069364162e-05, 'epoch': 7.78} + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0427, 'learning_rate': 8.583815028901733e-05, 'epoch': 7.79} + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|████���████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0411, 'learning_rate': 8.566473988439306e-05, 'epoch': 7.79} + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████��███████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0423, 'learning_rate': 8.549132947976878e-05, 'epoch': 7.8} + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0406, 'learning_rate': 8.53179190751445e-05, 'epoch': 7.8} +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▊ | 1741/2230 [11:12:23<3:13:06, 23.69s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▊ | 1741/2230 [11:12:23<3:13:06, 23.69s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0399, 'learning_rate': 8.514450867052023e-05, 'epoch': 7.81} +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0419, 'learning_rate': 8.497109826589596e-05, 'epoch': 7.81} +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:24:28,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:24:28,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:24:28,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:24:28,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:24:36,622 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:24:36,622 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:24:36,622 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:24:36,622 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:24:36,622 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0363, 'learning_rate': 8.479768786127167e-05, 'epoch': 7.82} +[WARNING|modeling_utils.py:388] 2022-03-27 04:24:46,437 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:24:46,437 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:24:50,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:24:50,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:24:54,705 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:24:54,705 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:24:54,705 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:24:54,705 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:24:54,705 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:24:54,705 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0392, 'learning_rate': 8.46242774566474e-05, 'epoch': 7.82} +[WARNING|modeling_utils.py:388] 2022-03-27 04:24:54,705 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:24:54,705 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:25:10,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:25:10,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:25:10,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:25:10,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:25:10,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:25:10,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:25:10,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:25:10,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:25:10,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0461, 'learning_rate': 8.445086705202311e-05, 'epoch': 7.83} +[WARNING|modeling_bart.py:1051] 2022-03-27 04:25:29,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:25:29,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:25:29,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:25:29,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:25:29,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:25:29,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:25:29,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:25:29,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:25:29,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:25:29,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▉ | 1746/2230 [11:14:14<2:59:44, 22.28s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:25:49,605 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:25:49,605 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:25:53,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:25:53,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:25:53,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:25:53,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:25:53,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:26:03,951 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:26:03,951 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▉ | 1747/2230 [11:14:35<2:56:11, 21.89s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▉ | 1747/2230 [11:14:35<2:56:11, 21.89s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0416, 'learning_rate': 8.410404624277456e-05, 'epoch': 7.83} + 78%|█████████████████████████████████████████████████████████▉ | 1747/2230 [11:14:35<2:56:11, 21.89s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▉ | 1747/2230 [11:14:35<2:56:11, 21.89s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|█████████████████████████████████████████████████████████▉ | 1747/2230 [11:14:35<2:56:11, 21.89s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:26:18,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:26:18,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:26:18,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:26:24,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:26:24,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|██████████████████████████████████████████████████████████ | 1748/2230 [11:14:56<2:51:58, 21.41s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|██████████████████████████████████████████████████████████ | 1748/2230 [11:14:56<2:51:58, 21.41s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.036, 'learning_rate': 8.393063583815028e-05, 'epoch': 7.84} + 78%|██████████████████████████████████████████████████████████ | 1748/2230 [11:14:56<2:51:58, 21.41s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:26:34,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:26:34,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:26:34,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:26:40,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:26:43,166 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:26:43,166 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:26:43,166 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:26:49,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:26:49,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0414, 'learning_rate': 8.3757225433526e-05, 'epoch': 7.84} +[WARNING|modeling_utils.py:388] 2022-03-27 04:26:49,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:26:49,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:26:49,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:26:58,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:26:58,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:27:03,216 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:27:03,216 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:27:03,216 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:27:03,216 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:27:03,216 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0326, 'learning_rate': 8.358381502890174e-05, 'epoch': 7.85} +[WARNING|modeling_bart.py:1051] 2022-03-27 04:27:13,715 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:27:13,715 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:27:17,560 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:27:19,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:27:19,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:27:23,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:27:23,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:27:27,828 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:27:27,828 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:27:30,087 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:27:32,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:27:32,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:27:36,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:27:38,410 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:27:40,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:27:40,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:27:44,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:27:44,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0486, 'learning_rate': 8.323699421965317e-05, 'epoch': 7.86} +[WARNING|modeling_bart.py:1051] 2022-03-27 04:27:48,052 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:27:50,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:27:52,114 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:27:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:27:56,176 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:27:58,154 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:27:58,154 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:00,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:02,135 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:04,034 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:05,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:07,791 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:09,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:11,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:13,290 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:13,290 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:15,062 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:18,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:20,388 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:22,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:23,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:25,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:28,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:28,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:30,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:32,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:35,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:36,670 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:38,153 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:41,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:41,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:42,631 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:44,052 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:46,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:49,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:50,321 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:52,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:52,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:54,321 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:56,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:28:59,129 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:00,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:02,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:02,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:04,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:06,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:08,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:10,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:10,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:12,950 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:14,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:16,649 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:16,649 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:19,390 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:21,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:23,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:24,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:24,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1026, 'learning_rate': 8.167630057803468e-05, 'epoch': 7.9} +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:28,250 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:31,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:31,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:35,390 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:35,390 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:38,912 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:38,912 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:42,463 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:45,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:45,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:49,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:49,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:53,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:53,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.075, 'learning_rate': 8.150289017341039e-05, 'epoch': 7.9} +[WARNING|modeling_bart.py:1051] 2022-03-27 04:29:56,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:00,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:00,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:03,860 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:03,860 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:07,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:07,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:11,677 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:11,677 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:15,056 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:18,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:18,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:21,952 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:21,952 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0526, 'learning_rate': 8.132947976878612e-05, 'epoch': 7.91} +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:25,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:28,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:28,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:32,243 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:32,243 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:35,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:39,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:39,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:42,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:42,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:45,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:49,184 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:49,184 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:49,184 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:52,678 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:52,678 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:56,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:59,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:30:59,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:02,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:02,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:06,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:09,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:09,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:12,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0562, 'learning_rate': 8.098265895953756e-05, 'epoch': 7.91} +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0543, 'learning_rate': 8.080924855491328e-05, 'epoch': 7.92} +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0638, 'learning_rate': 8.063583815028902e-05, 'epoch': 7.92} + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0364, 'learning_rate': 8.046242774566472e-05, 'epoch': 7.93} + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0621, 'learning_rate': 8.028901734104046e-05, 'epoch': 7.93} + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1770/2230 [11:21:54<3:14:14, 25.34s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1770/2230 [11:21:54<3:14:14, 25.34s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0567, 'learning_rate': 8.011560693641617e-05, 'epoch': 7.94} + 79%|██████████████████████████████████████████████████████████▋ | 1770/2230 [11:21:54<3:14:14, 25.34s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1770/2230 [11:21:54<3:14:14, 25.34s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1770/2230 [11:21:54<3:14:14, 25.34s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1770/2230 [11:21:54<3:14:14, 25.34s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1770/2230 [11:21:54<3:14:14, 25.34s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1770/2230 [11:21:54<3:14:14, 25.34s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1770/2230 [11:21:54<3:14:14, 25.34s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1770/2230 [11:21:54<3:14:14, 25.34s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1770/2230 [11:21:54<3:14:14, 25.34s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▋ | 1770/2230 [11:21:54<3:14:14, 25.34s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1771/2230 [11:22:19<3:12:44, 25.20s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1771/2230 [11:22:19<3:12:44, 25.20s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0482, 'learning_rate': 7.99421965317919e-05, 'epoch': 7.94} + 79%|██████████████████████████████████████████████████████████▊ | 1771/2230 [11:22:19<3:12:44, 25.20s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1771/2230 [11:22:19<3:12:44, 25.20s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1771/2230 [11:22:19<3:12:44, 25.20s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1771/2230 [11:22:19<3:12:44, 25.20s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1771/2230 [11:22:19<3:12:44, 25.20s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1771/2230 [11:22:19<3:12:44, 25.20s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1771/2230 [11:22:19<3:12:44, 25.20s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1771/2230 [11:22:19<3:12:44, 25.20s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1771/2230 [11:22:19<3:12:44, 25.20s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1771/2230 [11:22:19<3:12:44, 25.20s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0518, 'learning_rate': 7.976878612716762e-05, 'epoch': 7.95} + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0447, 'learning_rate': 7.959537572254334e-05, 'epoch': 7.95} + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0483, 'learning_rate': 7.942196531791906e-05, 'epoch': 7.96} + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0337, 'learning_rate': 7.92485549132948e-05, 'epoch': 7.96} + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:35:52,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:35:52,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:35:56,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:35:56,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:00,825 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:00,825 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:00,825 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:00,825 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:00,825 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:00,825 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:00,825 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:00,825 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0472, 'learning_rate': 7.890173410404624e-05, 'epoch': 7.97} +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:16,870 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:16,870 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:16,870 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:16,870 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:16,870 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:27,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:27,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:27,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:27,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:27,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0442, 'learning_rate': 7.872832369942196e-05, 'epoch': 7.97} +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:37,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:37,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:37,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:37,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:36:45,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:36:45,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:49,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:49,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:49,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0333, 'learning_rate': 7.855491329479768e-05, 'epoch': 7.98} +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:55,985 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:58,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:36:58,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:02,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:02,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:05,785 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:07,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:10,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:10,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:12,247 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:14,258 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:16,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:18,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:20,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:21,913 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:23,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:25,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:25,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:27,409 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:29,098 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:31,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:34,881 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:36,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:37,880 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:37,880 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:40,738 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:42,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:44,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:46,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:48,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:48,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:50,848 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:52,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:54,323 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:54,323 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:55,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:55,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:37:59,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:03,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:03,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:06,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:06,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:10,390 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:10,390 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:13,984 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:13,984 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:17,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:21,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:21,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:24,623 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:24,623 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0672, 'learning_rate': 7.751445086705202e-05, 'epoch': 8.0} +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:28,276 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:28,276 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:31,804 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:35,342 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:35,342 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0566, 'learning_rate': 7.734104046242774e-05, 'epoch': 8.01} +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0577, 'learning_rate': 7.716763005780346e-05, 'epoch': 8.01} +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0516, 'learning_rate': 7.699421965317918e-05, 'epoch': 8.02} +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1426, 'learning_rate': 7.682080924855491e-05, 'epoch': 8.02} +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0846, 'learning_rate': 7.647398843930635e-05, 'epoch': 8.03} + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0666, 'learning_rate': 7.630057803468207e-05, 'epoch': 8.04} + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0505, 'learning_rate': 7.61271676300578e-05, 'epoch': 8.04} + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0439, 'learning_rate': 7.595375722543352e-05, 'epoch': 8.04} + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0355, 'learning_rate': 7.578034682080925e-05, 'epoch': 8.05} + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|█████████████████████████████████████████████████████���█████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0469, 'learning_rate': 7.560693641618496e-05, 'epoch': 8.05} + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|█████████████████████████████████████████���█████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0565, 'learning_rate': 7.543352601156069e-05, 'epoch': 8.06} + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|█████████████████████████████���█████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0368, 'learning_rate': 7.52601156069364e-05, 'epoch': 8.06} + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████��█████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0524, 'learning_rate': 7.508670520231213e-05, 'epoch': 8.07} + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████��█████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0369, 'learning_rate': 7.491329479768785e-05, 'epoch': 8.07} + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0475, 'learning_rate': 7.473988439306357e-05, 'epoch': 8.08} + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.049, 'learning_rate': 7.45664739884393e-05, 'epoch': 8.08} + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0297, 'learning_rate': 7.421965317919074e-05, 'epoch': 8.09} + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.099, 'learning_rate': 7.404624277456646e-05, 'epoch': 8.09} + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0438, 'learning_rate': 7.387283236994219e-05, 'epoch': 8.1} + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1455, 'learning_rate': 7.369942196531791e-05, 'epoch': 8.1} + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████��█████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0296, 'learning_rate': 7.352601156069363e-05, 'epoch': 8.11} + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████��█████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0297, 'learning_rate': 7.335260115606935e-05, 'epoch': 8.11} + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████��█████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0351, 'learning_rate': 7.317919075144507e-05, 'epoch': 8.12} +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1393, 'learning_rate': 7.30057803468208e-05, 'epoch': 8.12} +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1093, 'learning_rate': 7.30057803468208e-05, 'epoch': 8.13} +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.08, 'learning_rate': 7.30057803468208e-05, 'epoch': 8.13} +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0995, 'learning_rate': 7.283236994219653e-05, 'epoch': 8.13} +[WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0839, 'learning_rate': 7.265895953757225e-05, 'epoch': 8.14} +[WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:51:28,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:51:28,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:51:33,170 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:51:33,170 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:51:33,170 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2315, 'learning_rate': 7.248554913294797e-05, 'epoch': 8.14} +[WARNING|modeling_utils.py:388] 2022-03-27 04:51:33,170 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:51:33,170 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:51:42,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:51:42,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:51:47,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:51:47,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:51:47,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:51:47,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:51:47,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:51:47,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:51:47,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2594, 'learning_rate': 7.231213872832369e-05, 'epoch': 8.15} +[WARNING|modeling_utils.py:388] 2022-03-27 04:51:47,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:51:47,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:51:47,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:52:07,459 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:52:07,459 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:52:07,459 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:52:07,459 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:52:07,459 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:52:07,459 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 82%|████████████████████████████████████████████████████████████▎ | 1818/2230 [11:40:47<2:33:43, 22.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 82%|████████████████████████████████████████████████████████████▎ | 1818/2230 [11:40:47<2:33:43, 22.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1639, 'learning_rate': 7.213872832369941e-05, 'epoch': 8.15} + 82%|████████████████████████████████████████████████████████████▎ | 1818/2230 [11:40:47<2:33:43, 22.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 82%|████████████████████████████████████████████████████████████▎ | 1818/2230 [11:40:47<2:33:43, 22.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 82%|████████████████████████████████████████████████████████████▎ | 1818/2230 [11:40:47<2:33:43, 22.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 82%|████████████████████████████████████████████████████████████▎ | 1818/2230 [11:40:47<2:33:43, 22.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 82%|████████████████████████████████████████████████████████████▎ | 1818/2230 [11:40:47<2:33:43, 22.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 82%|████████████████████████████████████████████████████████████▎ | 1818/2230 [11:40:47<2:33:43, 22.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 82%|████████████████████████████████████████████████████████████▎ | 1818/2230 [11:40:47<2:33:43, 22.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 82%|████████████████████████████████████████████████████████████▎ | 1818/2230 [11:40:47<2:33:43, 22.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:52:40,295 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:52:40,295 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:52:40,295 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:52:44,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:52:44,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:52:44,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:52:44,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:52:52,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:52:52,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:52:52,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:52:52,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:52:52,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:53:02,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:53:02,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1424, 'learning_rate': 7.179190751445085e-05, 'epoch': 8.16} +[WARNING|modeling_utils.py:388] 2022-03-27 04:53:02,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:53:02,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:53:02,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:53:02,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:53:14,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:53:14,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:53:14,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:53:14,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:53:22,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:53:22,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3705, 'learning_rate': 7.161849710982659e-05, 'epoch': 8.17} +[WARNING|modeling_utils.py:388] 2022-03-27 04:53:22,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:53:29,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:53:29,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:53:29,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:53:35,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:53:35,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:53:39,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:53:39,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:53:39,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:53:39,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:53:46,033 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:53:46,033 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:53:50,091 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:53:50,091 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:53:50,091 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:53:56,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:53:56,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:54:00,351 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 82%|████████████████████████████████████████████████████████████▍ | 1823/2230 [11:42:30<2:18:40, 20.44s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 82%|████████████████████████████████████████████████████████████▍ | 1823/2230 [11:42:30<2:18:40, 20.44s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:54:04,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:54:04,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:54:04,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:54:10,289 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:54:12,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:54:12,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:54:16,664 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:54:18,896 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 82%|████████████████████████████████████████████████████████████▌ | 1824/2230 [11:42:48<2:14:16, 19.84s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 82%|████████████████████████████████████████████████████████████▌ | 1824/2230 [11:42:48<2:14:16, 19.84s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:54:22,767 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:54:24,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:54:27,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:54:27,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:54:31,201 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:54:33,337 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:54:35,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:54:37,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:54:37,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 04:54:37,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.184, 'learning_rate': 7.092485549132947e-05, 'epoch': 8.18} +[WARNING|modeling_utils.py:388] 2022-03-27 04:54:43,289 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:54:45,341 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:54:47,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:54:49,427 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:54:51,410 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:54:53,442 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:54:55,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:54:55,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:54:57,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:54:59,442 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:01,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:03,205 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:05,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:06,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:08,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:08,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:10,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:12,409 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:14,180 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:17,589 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:19,248 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:20,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:22,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:22,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:25,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:27,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:28,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:32,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:33,618 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:35,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:35,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:38,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:39,528 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:42,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:43,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:46,322 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:47,589 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:47,589 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:50,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:51,397 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:53,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:56,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:58,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:55:58,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:00,478 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:02,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:03,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:05,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:05,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:07,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:10,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:12,475 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:14,265 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:14,265 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:16,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:18,566 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:20,792 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:20,792 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1203, 'learning_rate': 6.936416184971097e-05, 'epoch': 8.22} +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:24,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:24,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:28,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:28,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:31,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:31,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:35,551 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:35,551 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:39,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:42,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:42,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:46,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:46,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:46,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:49,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:53,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:53,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:57,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:56:57,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:00,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:00,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:04,062 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:07,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:07,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:11,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:11,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:14,667 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:18,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:18,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1833, 'learning_rate': 6.901734104046242e-05, 'epoch': 8.23} +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:21,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:21,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:25,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:25,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:28,822 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:32,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:32,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:35,782 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:35,782 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:39,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:42,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:42,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:46,341 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:46,341 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1643, 'learning_rate': 6.884393063583815e-05, 'epoch': 8.24} +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:49,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:49,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:53,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:56,830 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:57:56,830 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:01,255 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:01,255 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1469, 'learning_rate': 6.867052023121387e-05, 'epoch': 8.24} +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0913, 'learning_rate': 6.849710982658959e-05, 'epoch': 8.25} +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0403, 'learning_rate': 6.832369942196531e-05, 'epoch': 8.25} +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.096, 'learning_rate': 6.815028901734103e-05, 'epoch': 8.26} +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████ | 1842/2230 [11:48:31<2:50:12, 26.32s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████ | 1842/2230 [11:48:31<2:50:12, 26.32s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████ | 1842/2230 [11:48:31<2:50:12, 26.32s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████ | 1842/2230 [11:48:31<2:50:12, 26.32s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████ | 1842/2230 [11:48:31<2:50:12, 26.32s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████ | 1842/2230 [11:48:31<2:50:12, 26.32s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████ | 1842/2230 [11:48:31<2:50:12, 26.32s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████ | 1842/2230 [11:48:31<2:50:12, 26.32s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████ | 1842/2230 [11:48:31<2:50:12, 26.32s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████ | 1842/2230 [11:48:31<2:50:12, 26.32s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████ | 1842/2230 [11:48:31<2:50:12, 26.32s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████ | 1842/2230 [11:48:31<2:50:12, 26.32s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████ | 1842/2230 [11:48:31<2:50:12, 26.32s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2121, 'learning_rate': 6.780346820809248e-05, 'epoch': 8.26} + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1506, 'learning_rate': 6.76300578034682e-05, 'epoch': 8.27} + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0472, 'learning_rate': 6.745664739884392e-05, 'epoch': 8.27} + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0873, 'learning_rate': 6.728323699421964e-05, 'epoch': 8.28} + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1552, 'learning_rate': 6.710982658959537e-05, 'epoch': 8.28} + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.047, 'learning_rate': 6.693641618497109e-05, 'epoch': 8.29} + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0891, 'learning_rate': 6.658959537572254e-05, 'epoch': 8.3} + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0929, 'learning_rate': 6.641618497109826e-05, 'epoch': 8.3} + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0883, 'learning_rate': 6.624277456647398e-05, 'epoch': 8.3} + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0795, 'learning_rate': 6.60693641618497e-05, 'epoch': 8.31} + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1435, 'learning_rate': 6.589595375722542e-05, 'epoch': 8.31} + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1741, 'learning_rate': 6.572254335260114e-05, 'epoch': 8.32} + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0961, 'learning_rate': 6.554913294797688e-05, 'epoch': 8.32} + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████��███████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:27,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:27,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:27,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1966, 'learning_rate': 6.53757225433526e-05, 'epoch': 8.33} +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:27,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:27,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:27,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:27,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:27,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:27,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1273, 'learning_rate': 6.520231213872832e-05, 'epoch': 8.33} +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0822, 'learning_rate': 6.502890173410404e-05, 'epoch': 8.34} +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0902, 'learning_rate': 6.485549132947976e-05, 'epoch': 8.34} + 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|██��██████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0709, 'learning_rate': 6.468208092485548e-05, 'epoch': 8.35} + 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▊ | 1862/2230 [11:56:57<2:26:48, 23.94s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▊ | 1862/2230 [11:56:57<2:26:48, 23.94s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▊ | 1862/2230 [11:56:57<2:26:48, 23.94s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▊ | 1862/2230 [11:56:57<2:26:48, 23.94s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▊ | 1862/2230 [11:56:57<2:26:48, 23.94s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▊ | 1862/2230 [11:56:57<2:26:48, 23.94s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▊ | 1862/2230 [11:56:57<2:26:48, 23.94s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▊ | 1862/2230 [11:56:57<2:26:48, 23.94s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▊ | 1862/2230 [11:56:57<2:26:48, 23.94s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:08:48,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:08:48,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:08:48,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:08:48,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0312, 'learning_rate': 6.433526011560694e-05, 'epoch': 8.35} +[WARNING|modeling_bart.py:1051] 2022-03-27 05:08:48,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:08:48,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:09:00,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:09:00,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:09:00,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:09:00,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:09:00,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:09:00,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:09:00,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:09:00,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0409, 'learning_rate': 6.416184971098266e-05, 'epoch': 8.36} + 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0308, 'learning_rate': 6.398843930635838e-05, 'epoch': 8.36} + 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0323, 'learning_rate': 6.38150289017341e-05, 'epoch': 8.37} + 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:13,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:13,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:13,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:13,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:13,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:13,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:13,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0841, 'learning_rate': 6.364161849710982e-05, 'epoch': 8.37} +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:13,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:29,703 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:29,703 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:33,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:33,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:33,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:33,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:33,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:33,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.086, 'learning_rate': 6.346820809248554e-05, 'epoch': 8.38} +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0813, 'learning_rate': 6.329479768786126e-05, 'epoch': 8.38} +[WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:11:14,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:11:14,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:11:14,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:11:14,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:11:14,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:11:14,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:11:14,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:11:14,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:11:14,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:11:31,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:11:31,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:11:31,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:11:36,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:11:36,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:11:36,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:11:43,105 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:11:43,105 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:11:43,105 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:11:49,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:11:49,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0999, 'learning_rate': 6.294797687861272e-05, 'epoch': 8.39} +[WARNING|modeling_utils.py:388] 2022-03-27 05:11:49,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:11:55,735 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:11:55,735 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:11:55,735 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:11:55,735 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:11:55,735 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:12:05,594 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:12:05,594 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|██████████████████████████████████████████████████████████████ | 1872/2230 [12:00:37<2:05:14, 20.99s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|██████████████████████████████████████████████████████████████ | 1872/2230 [12:00:37<2:05:14, 20.99s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:12:11,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:12:11,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:12:16,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:12:16,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:12:20,097 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:12:20,097 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:12:20,097 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:12:26,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:12:26,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:12:26,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.038, 'learning_rate': 6.260115606936416e-05, 'epoch': 8.4} +[WARNING|modeling_utils.py:388] 2022-03-27 05:12:32,028 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:12:34,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:12:34,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:12:38,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:12:38,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:12:42,369 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:12:44,594 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:12:46,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:12:46,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0383, 'learning_rate': 6.242774566473988e-05, 'epoch': 8.4} +[WARNING|modeling_bart.py:1051] 2022-03-27 05:12:50,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:12:52,962 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:12:55,114 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:12:57,207 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:12:59,308 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:01,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:03,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:03,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:03,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:07,497 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:09,500 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:11,474 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:13,485 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:15,418 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:17,332 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:19,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:19,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|██████████████████████████████████████████████████████████████▎ | 1876/2230 [12:01:48<1:47:41, 18.25s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:13:21,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:23,222 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:21,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:25,067 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:21,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:26,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:21,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:28,762 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:21,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:30,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:21,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:34,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:21,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|██████████████████████████████████████████████████████████████▎ | 1877/2230 [12:02:03<1:41:09, 17.19s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:13:35,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|██████████████████████████████████████████████████████████████▎ | 1877/2230 [12:02:03<1:41:09, 17.19s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:13:35,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:37,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:35,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:39,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:35,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:41,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:35,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:42,825 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:35,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:46,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:35,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:47,734 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:35,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:47,734 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:35,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|██████████████████████████████████████████████████████████████▎ | 1878/2230 [12:02:16<1:34:23, 16.09s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:13:49,434 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:51,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:49,434 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:54,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:49,434 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:55,601 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:49,434 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:57,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:49,434 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:59,978 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:49,434 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:13:59,978 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:49,434 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|██████████████████████████████████████████████████████████████▎ | 1879/2230 [12:02:28<1:27:07, 14.89s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:14:01,474 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:14:04,206 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:01,474 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:14:05,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:01,474 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:14:08,134 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:01,474 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:14:10,646 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:01,474 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|██████████████████████████████████████████████████████████████▍ | 1880/2230 [12:02:39<1:19:20, 13.60s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:14:11,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|██████████████████████████████████████████████████████████████▍ | 1880/2230 [12:02:39<1:19:20, 13.60s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:14:11,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:14:14,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:11,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:14:16,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:11,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:14:18,972 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:11,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|██████████████████████████████████████████████████████████████▍ | 1881/2230 [12:02:48<1:11:38, 12.32s/it] Setting `use_cache=False`...1] 2022-03-27 05:14:11,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|██████████████████████████████████████████████████████████████▍ | 1881/2230 [12:02:48<1:11:38, 12.32s/it] Setting `use_cache=False`...1] 2022-03-27 05:14:11,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:14:22,358 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:21,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:14:24,204 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:21,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:14:26,297 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:21,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:14:28,291 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:21,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|██████████████████████████████████████████████████████████████▍ | 1882/2230 [12:02:58<1:05:51, 11.35s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:14:30,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|██████████████████████████████████████████████████████████████▍ | 1882/2230 [12:02:58<1:05:51, 11.35s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:14:30,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:14:32,270 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:30,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:14:34,094 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:30,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:14:36,735 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:30,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:14:38,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:37,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:14:38,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:37,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:14:40,232 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:37,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:14:42,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:37,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|████████████████████████████████████████████████████████████████▏ | 1884/2230 [12:03:11<52:13, 9.06s/it] Setting `use_cache=False`...1] 2022-03-27 05:14:37,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|████████████████████████████████████████████████████████████████▏ | 1884/2230 [12:03:11<52:13, 9.06s/it] Setting `use_cache=False`...1] 2022-03-27 05:14:37,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|████████████████████████████████████████████████████████████████▏ | 1884/2230 [12:03:11<52:13, 9.06s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:14:48,991 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:14:48,991 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:14:52,629 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:14:52,629 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:14:56,289 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:14:56,289 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:14:59,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:15:03,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:15:03,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:15:07,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:15:07,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:15:10,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▌ | 1885/2230 [12:03:41<1:26:55, 15.12s/it] Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▌ | 1885/2230 [12:03:41<1:26:55, 15.12s/it] Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▌ | 1885/2230 [12:03:41<1:26:55, 15.12s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▌ | 1885/2230 [12:03:41<1:26:55, 15.12s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:15:17,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:15:21,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:15:21,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:15:24,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:15:24,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:15:28,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:15:31,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:15:31,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:15:35,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:15:35,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:15:38,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:15:38,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▌ | 1886/2230 [12:04:09<1:49:11, 19.05s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▌ | 1886/2230 [12:04:09<1:49:11, 19.05s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:15:45,984 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:15:45,984 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:15:49,482 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:15:49,482 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:15:53,006 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:15:56,561 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:15:56,561 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:00,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:00,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:03,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:03,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:07,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:07,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▌ | 1887/2230 [12:04:37<2:04:24, 21.76s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▌ | 1887/2230 [12:04:37<2:04:24, 21.76s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:13,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:13,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:17,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:20,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:20,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:20,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:25,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:28,822 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:28,822 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0413, 'learning_rate': 5.9999999999999995e-05, 'epoch': 8.47} +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0408, 'learning_rate': 5.982658959537572e-05, 'epoch': 8.47} +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0403, 'learning_rate': 5.965317919075144e-05, 'epoch': 8.48} +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.042, 'learning_rate': 5.9479768786127164e-05, 'epoch': 8.48} +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0374, 'learning_rate': 5.930635838150289e-05, 'epoch': 8.48} +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0483, 'learning_rate': 5.913294797687861e-05, 'epoch': 8.49} +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0782, 'learning_rate': 5.895953757225433e-05, 'epoch': 8.49} + 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.087, 'learning_rate': 5.878612716763006e-05, 'epoch': 8.5} + 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████��███████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0401, 'learning_rate': 5.84393063583815e-05, 'epoch': 8.51} +[WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0318, 'learning_rate': 5.8265895953757215e-05, 'epoch': 8.51} + 85%|████████████████████████���█████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0416, 'learning_rate': 5.8092485549132936e-05, 'epoch': 8.52} + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0305, 'learning_rate': 5.791907514450866e-05, 'epoch': 8.52} + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0404, 'learning_rate': 5.7745664739884384e-05, 'epoch': 8.52} + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0756, 'learning_rate': 5.7572254335260105e-05, 'epoch': 8.53} + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███��██████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0329, 'learning_rate': 5.739884393063583e-05, 'epoch': 8.53} + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0344, 'learning_rate': 5.722543352601155e-05, 'epoch': 8.54} + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|█████████████████████████████████████████████████���█████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0259, 'learning_rate': 5.705202312138727e-05, 'epoch': 8.54} + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██��████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0342, 'learning_rate': 5.6878612716762994e-05, 'epoch': 8.55} + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0425, 'learning_rate': 5.670520231213872e-05, 'epoch': 8.55} + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0411, 'learning_rate': 5.653179190751444e-05, 'epoch': 8.56} + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0291, 'learning_rate': 5.635838150289016e-05, 'epoch': 8.56} + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0399, 'learning_rate': 5.618497109826589e-05, 'epoch': 8.57} + 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0319, 'learning_rate': 5.601156069364161e-05, 'epoch': 8.57} + 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1912/2230 [12:15:19<2:07:05, 23.98s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1912/2230 [12:15:19<2:07:05, 23.98s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0314, 'learning_rate': 5.583815028901733e-05, 'epoch': 8.57} + 86%|███████████████████████████████████████████████████████████████▍ | 1912/2230 [12:15:19<2:07:05, 23.98s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1912/2230 [12:15:19<2:07:05, 23.98s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1912/2230 [12:15:19<2:07:05, 23.98s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1912/2230 [12:15:19<2:07:05, 23.98s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1912/2230 [12:15:19<2:07:05, 23.98s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1912/2230 [12:15:19<2:07:05, 23.98s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1912/2230 [12:15:19<2:07:05, 23.98s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1912/2230 [12:15:19<2:07:05, 23.98s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1912/2230 [12:15:19<2:07:05, 23.98s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1912/2230 [12:15:19<2:07:05, 23.98s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1913/2230 [12:15:44<2:07:11, 24.07s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1913/2230 [12:15:44<2:07:11, 24.07s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0389, 'learning_rate': 5.566473988439306e-05, 'epoch': 8.58} + 86%|███████████████████████████████████████████████████████████████▍ | 1913/2230 [12:15:44<2:07:11, 24.07s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1913/2230 [12:15:44<2:07:11, 24.07s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1913/2230 [12:15:44<2:07:11, 24.07s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1913/2230 [12:15:44<2:07:11, 24.07s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1913/2230 [12:15:44<2:07:11, 24.07s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1913/2230 [12:15:44<2:07:11, 24.07s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1913/2230 [12:15:44<2:07:11, 24.07s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1913/2230 [12:15:44<2:07:11, 24.07s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▍ | 1913/2230 [12:15:44<2:07:11, 24.07s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.034, 'learning_rate': 5.549132947976878e-05, 'epoch': 8.58} +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0278, 'learning_rate': 5.53179190751445e-05, 'epoch': 8.59} +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0305, 'learning_rate': 5.514450867052022e-05, 'epoch': 8.59} +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0461, 'learning_rate': 5.497109826589595e-05, 'epoch': 8.6} +[WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:28:52,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:28:52,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:28:57,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:28:57,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:28:57,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:28:57,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:28:57,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:28:57,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▋ | 1918/2230 [12:17:37<1:57:47, 22.65s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▋ | 1918/2230 [12:17:37<1:57:47, 22.65s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0264, 'learning_rate': 5.479768786127167e-05, 'epoch': 8.6} +[WARNING|modeling_utils.py:388] 2022-03-27 05:29:13,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:29:13,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:29:13,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:29:13,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:29:13,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:29:13,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:29:13,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:29:13,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:29:13,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:29:31,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:29:31,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0278, 'learning_rate': 5.462427745664739e-05, 'epoch': 8.61} +[WARNING|modeling_utils.py:388] 2022-03-27 05:29:31,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:29:31,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:29:31,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:29:31,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:29:31,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:29:46,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:29:46,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:29:46,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:29:46,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:29:46,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0276, 'learning_rate': 5.445086705202312e-05, 'epoch': 8.61} +[WARNING|modeling_utils.py:388] 2022-03-27 05:29:46,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:29:46,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:29:46,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:29:46,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:29:46,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:29:46,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:30:08,245 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:30:08,245 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:30:08,245 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:30:08,245 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:30:08,245 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0338, 'learning_rate': 5.427745664739884e-05, 'epoch': 8.61} +[WARNING|modeling_utils.py:388] 2022-03-27 05:30:18,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:30:18,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:30:18,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:30:24,618 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:30:24,618 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:30:24,618 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:30:30,818 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:30:30,818 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:30:30,818 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:30:30,818 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:30:37,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:30:37,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:30:37,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:30:43,050 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:30:45,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:30:45,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:30:45,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:30:45,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:30:45,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▊ | 1923/2230 [12:19:20<1:45:37, 20.64s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▊ | 1923/2230 [12:19:20<1:45:37, 20.64s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▊ | 1923/2230 [12:19:20<1:45:37, 20.64s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:30:59,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:30:59,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:31:03,322 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:31:05,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:31:05,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:31:05,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:31:05,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:31:11,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:31:11,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:31:15,644 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:31:17,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:31:17,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:31:21,628 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:31:23,795 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:31:25,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:31:28,074 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:31:28,074 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0265, 'learning_rate': 5.358381502890173e-05, 'epoch': 8.63} +[WARNING|modeling_utils.py:388] 2022-03-27 05:31:32,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:31:34,365 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:31:34,365 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:31:38,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:31:40,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:31:42,197 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:31:44,200 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:31:46,210 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:31:46,210 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:31:48,263 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:31:50,207 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:31:52,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:31:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:31:55,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:31:57,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:31:59,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:31:59,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:01,481 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:03,425 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:05,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:08,851 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:10,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:12,340 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:14,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:14,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:15,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:19,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:20,730 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:22,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:23,890 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:26,980 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:28,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:28,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:30,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:32,998 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:34,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:37,184 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:38,496 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:38,496 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:39,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:42,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:45,023 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:46,222 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:48,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:48,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:50,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:52,058 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:54,989 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:57,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:59,067 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:32:59,067 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:01,107 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:02,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:05,592 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:05,592 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:07,440 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:09,114 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:11,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:12,902 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:12,902 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0374, 'learning_rate': 5.2023121387283234e-05, 'epoch': 8.67} +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:16,258 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:16,258 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:19,980 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:23,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:23,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:27,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:27,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:30,906 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:30,906 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:34,452 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:37,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:37,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:41,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:41,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0673, 'learning_rate': 5.1849710982658955e-05, 'epoch': 8.68} +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:45,252 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:45,252 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:48,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:52,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:52,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:55,952 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:55,952 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:59,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:33:59,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:02,995 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:06,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:06,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:10,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:10,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0524, 'learning_rate': 5.1676300578034675e-05, 'epoch': 8.68} +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:13,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:17,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:17,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:20,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:20,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:24,000 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:27,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:27,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:30,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:30,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:34,459 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:34,459 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:34,459 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:37,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:41,377 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:41,377 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:44,863 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:44,863 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:48,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:51,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:51,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0507, 'learning_rate': 5.1329479768786124e-05, 'epoch': 8.69} +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0441, 'learning_rate': 5.1156069364161844e-05, 'epoch': 8.7} +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0386, 'learning_rate': 5.0982658959537565e-05, 'epoch': 8.7} +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0389, 'learning_rate': 5.080924855491329e-05, 'epoch': 8.7} +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0461, 'learning_rate': 5.063583815028901e-05, 'epoch': 8.71} +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0384, 'learning_rate': 5.0462427745664734e-05, 'epoch': 8.71} +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.034, 'learning_rate': 5.028901734104046e-05, 'epoch': 8.72} +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.029, 'learning_rate': 5.011560693641618e-05, 'epoch': 8.72} + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0438, 'learning_rate': 4.99421965317919e-05, 'epoch': 8.73} + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|█████████████████████████████████████████████████████████���██████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0402, 'learning_rate': 4.976878612716762e-05, 'epoch': 8.73} + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.034, 'learning_rate': 4.959537572254335e-05, 'epoch': 8.74} + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0376, 'learning_rate': 4.942196531791907e-05, 'epoch': 8.74} + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|█████████████████████████���██████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.03, 'learning_rate': 4.924855491329479e-05, 'epoch': 8.74} + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0375, 'learning_rate': 4.907514450867052e-05, 'epoch': 8.75} + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.044, 'learning_rate': 4.890173410404624e-05, 'epoch': 8.75} + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0438, 'learning_rate': 4.872832369942196e-05, 'epoch': 8.76} + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0375, 'learning_rate': 4.855491329479768e-05, 'epoch': 8.76} + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0526, 'learning_rate': 4.838150289017341e-05, 'epoch': 8.77} + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0449, 'learning_rate': 4.820809248554913e-05, 'epoch': 8.77} + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0414, 'learning_rate': 4.803468208092485e-05, 'epoch': 8.78} + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.031, 'learning_rate': 4.786127167630058e-05, 'epoch': 8.78} +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0402, 'learning_rate': 4.76878612716763e-05, 'epoch': 8.78} +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0268, 'learning_rate': 4.751445086705202e-05, 'epoch': 8.79} +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.036, 'learning_rate': 4.734104046242774e-05, 'epoch': 8.79} +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████ | 1962/2230 [12:33:49<1:46:57, 23.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████ | 1962/2230 [12:33:49<1:46:57, 23.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0352, 'learning_rate': 4.716763005780347e-05, 'epoch': 8.8} + 88%|█████████████████████████████████████████████████████████████████ | 1962/2230 [12:33:49<1:46:57, 23.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████ | 1962/2230 [12:33:49<1:46:57, 23.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████ | 1962/2230 [12:33:49<1:46:57, 23.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████ | 1962/2230 [12:33:49<1:46:57, 23.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████ | 1962/2230 [12:33:49<1:46:57, 23.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████ | 1962/2230 [12:33:49<1:46:57, 23.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████ | 1962/2230 [12:33:49<1:46:57, 23.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████ | 1962/2230 [12:33:49<1:46:57, 23.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████ | 1962/2230 [12:33:49<1:46:57, 23.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████ | 1962/2230 [12:33:49<1:46:57, 23.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████▏ | 1963/2230 [12:34:13<1:47:12, 24.09s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████▏ | 1963/2230 [12:34:13<1:47:12, 24.09s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0335, 'learning_rate': 4.699421965317919e-05, 'epoch': 8.8} + 88%|█████████████████████████████████████████████████████████████████▏ | 1963/2230 [12:34:13<1:47:12, 24.09s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████���███████████████████████████████████████████████████▏ | 1963/2230 [12:34:13<1:47:12, 24.09s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████▏ | 1963/2230 [12:34:13<1:47:12, 24.09s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████▏ | 1963/2230 [12:34:13<1:47:12, 24.09s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████▏ | 1963/2230 [12:34:13<1:47:12, 24.09s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████▏ | 1963/2230 [12:34:13<1:47:12, 24.09s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████▏ | 1963/2230 [12:34:13<1:47:12, 24.09s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████▏ | 1963/2230 [12:34:13<1:47:12, 24.09s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████▏ | 1963/2230 [12:34:13<1:47:12, 24.09s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████▏ | 1963/2230 [12:34:13<1:47:12, 24.09s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:08,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:08,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:08,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:08,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:08,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:08,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:08,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:08,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:08,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:08,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:08,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:31,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:31,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0366, 'learning_rate': 4.6647398843930636e-05, 'epoch': 8.81} +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:31,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:31,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:31,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:31,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:31,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:31,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:31,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:31,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:31,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:53,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:53,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0244, 'learning_rate': 4.647398843930636e-05, 'epoch': 8.82} +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:53,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:53,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:53,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:53,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:53,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:53,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:53,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:53,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:46:53,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████▎ | 1967/2230 [12:35:44<1:40:32, 22.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████▎ | 1967/2230 [12:35:44<1:40:32, 22.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0418, 'learning_rate': 4.630057803468208e-05, 'epoch': 8.82} + 88%|█████████████████████████████████████████████████████████████████▎ | 1967/2230 [12:35:44<1:40:32, 22.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████▎ | 1967/2230 [12:35:44<1:40:32, 22.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████▎ | 1967/2230 [12:35:44<1:40:32, 22.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████▎ | 1967/2230 [12:35:44<1:40:32, 22.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████▎ | 1967/2230 [12:35:44<1:40:32, 22.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████▎ | 1967/2230 [12:35:44<1:40:32, 22.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████▎ | 1967/2230 [12:35:44<1:40:32, 22.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:47:35,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:47:35,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:47:35,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0453, 'learning_rate': 4.61271676300578e-05, 'epoch': 8.83} +[WARNING|modeling_utils.py:388] 2022-03-27 05:47:35,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:47:35,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:47:35,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:47:47,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:47:47,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:47:47,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:47:47,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:47:47,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:47:57,410 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:47:57,410 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:47:57,410 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0253, 'learning_rate': 4.5953757225433526e-05, 'epoch': 8.83} +[WARNING|modeling_utils.py:388] 2022-03-27 05:47:57,410 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:47:57,410 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:47:57,410 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:47:57,410 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:48:11,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:48:11,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:48:11,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:48:11,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:48:11,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:48:11,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0302, 'learning_rate': 4.5780346820809246e-05, 'epoch': 8.83} +[WARNING|modeling_bart.py:1051] 2022-03-27 05:48:24,555 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:48:24,555 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:48:24,555 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:48:30,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:48:30,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:48:30,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:48:30,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:48:38,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:48:38,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:48:38,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0309, 'learning_rate': 4.560693641618497e-05, 'epoch': 8.84} +[WARNING|modeling_bart.py:1051] 2022-03-27 05:48:38,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:48:46,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:48:46,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:48:46,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:48:52,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:48:52,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:48:56,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:48:56,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:48:56,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:48:56,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0245, 'learning_rate': 4.5433526011560694e-05, 'epoch': 8.84} +[WARNING|modeling_bart.py:1051] 2022-03-27 05:49:04,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:49:04,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:49:09,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:49:09,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:49:09,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:49:15,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:49:15,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:49:15,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:49:20,972 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:49:20,972 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:49:23,405 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:49:23,405 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:49:27,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:49:27,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:49:31,456 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:49:33,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:49:33,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:49:37,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████▌ | 1974/2230 [12:38:07<1:24:57, 19.91s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████▌ | 1974/2230 [12:38:07<1:24:57, 19.91s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0278, 'learning_rate': 4.5086705202312136e-05, 'epoch': 8.85} +[WARNING|modeling_bart.py:1051] 2022-03-27 05:49:43,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:49:45,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:49:47,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:49:49,921 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:49:52,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:49:54,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:49:56,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:49:56,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:49:56,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:00,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:02,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:04,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:06,478 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:08,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:10,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:12,429 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████▌ | 1976/2230 [12:38:41<1:18:09, 18.46s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:50:14,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████▌ | 1976/2230 [12:38:41<1:18:09, 18.46s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:50:14,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:16,428 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:14,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:18,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:14,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:20,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:14,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:22,093 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:14,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:23,962 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:14,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:25,801 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:14,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████▌ | 1977/2230 [12:38:56<1:13:37, 17.46s/it] Setting `use_cache=False`...1] 2022-03-27 05:50:14,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████▌ | 1977/2230 [12:38:56<1:13:37, 17.46s/it] Setting `use_cache=False`...1] 2022-03-27 05:50:14,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:31,340 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:29,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:33,107 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:29,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:34,857 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:29,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:36,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:29,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:38,310 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:29,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:39,972 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:29,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:39,972 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:29,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████▋ | 1978/2230 [12:39:10<1:08:50, 16.39s/it] Setting `use_cache=False`...1] 2022-03-27 05:50:29,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:45,049 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:43,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:46,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:43,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:48,292 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:43,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:49,885 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:43,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:52,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:43,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:54,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:43,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:54,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:43,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████▋ | 1979/2230 [12:39:23<1:04:03, 15.31s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:50:56,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:50:59,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:56,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:00,495 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:56,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:03,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:56,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:04,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:56,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:04,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:56,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|███████████████████████████████████████████████████████████████████▍ | 1980/2230 [12:39:35<58:56, 14.15s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:51:07,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:08,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:07,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:11,496 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:07,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:12,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:07,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:15,222 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:07,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:15,222 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:07,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|███████████████████████████████████████████████████████████████████▌ | 1981/2230 [12:39:45<53:50, 12.97s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:51:17,677 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:18,816 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:17,677 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:21,811 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:17,677 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:23,954 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:17,677 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:25,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:17,677 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:25,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:17,677 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|███████████████████████████████████████████████████████████████████▌ | 1982/2230 [12:39:54<49:18, 11.93s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:51:27,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:29,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:27,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:31,928 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:27,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:33,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:27,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:33,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:27,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:35,613 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:34,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:37,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:34,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:39,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:34,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:39,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:34,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|███████████████████████████████████████████████████████████████████▌ | 1984/2230 [12:40:08<38:34, 9.41s/it] Setting `use_cache=False`...1] 2022-03-27 05:51:34,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|███████████████████████████████████████████████████████████████████▌ | 1984/2230 [12:40:08<38:34, 9.41s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:45,798 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:45,798 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:49,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:49,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:53,035 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:53,035 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:51:56,561 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:00,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:00,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:03,568 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:03,568 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:07,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:07,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████▊ | 1985/2230 [12:40:37<1:01:54, 15.16s/it] Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████▊ | 1985/2230 [12:40:37<1:01:54, 15.16s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:14,189 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:14,189 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:17,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:17,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:21,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:24,623 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:24,623 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:28,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:28,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:31,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:34,954 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:34,954 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:34,954 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████▉ | 1986/2230 [12:41:05<1:17:05, 18.96s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:52:38,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████▉ | 1986/2230 [12:41:05<1:17:05, 18.96s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:52:38,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:41,870 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:38,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:45,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:38,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:45,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:38,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:48,616 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:38,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:48,616 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:38,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:51,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:38,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:55,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:38,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:55,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:38,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:58,828 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:38,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:52:58,828 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:38,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:02,216 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:38,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████��███████████████████████████████████████████████▉ | 1987/2230 [12:41:32<1:26:47, 21.43s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████▉ | 1987/2230 [12:41:32<1:26:47, 21.43s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.05, 'learning_rate': 4.283236994219653e-05, 'epoch': 8.91} +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:08,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:08,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:12,342 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:15,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:15,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:15,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:20,038 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0402, 'learning_rate': 4.265895953757225e-05, 'epoch': 8.91} +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0487, 'learning_rate': 4.248554913294798e-05, 'epoch': 8.92} +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0366, 'learning_rate': 4.23121387283237e-05, 'epoch': 8.92} + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0349, 'learning_rate': 4.213872832369942e-05, 'epoch': 8.93} + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0305, 'learning_rate': 4.196531791907514e-05, 'epoch': 8.93} + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0443, 'learning_rate': 4.179190751445087e-05, 'epoch': 8.94} + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0418, 'learning_rate': 4.161849710982658e-05, 'epoch': 8.94} + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.03, 'learning_rate': 4.1445086705202304e-05, 'epoch': 8.95} + 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:38,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:38,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:38,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:38,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:38,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0308, 'learning_rate': 4.1271676300578025e-05, 'epoch': 8.95} +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0293, 'learning_rate': 4.109826589595375e-05, 'epoch': 8.96} +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0419, 'learning_rate': 4.092485549132947e-05, 'epoch': 8.96} +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▎ | 1999/2230 [12:46:31<1:30:18, 23.46s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▎ | 1999/2230 [12:46:31<1:30:18, 23.46s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0258, 'learning_rate': 4.075144508670519e-05, 'epoch': 8.96} +[WARNING|modeling_bart.py:1051] 2022-03-27 05:58:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:58:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 05:58:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:58:14,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:58:14,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:58:14,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:58:14,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:58:14,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 05:58:14,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0291, 'learning_rate': 4.057803468208092e-05, 'epoch': 8.97} +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +03/27/2022 06:07:57 - INFO - datasets.metric - Removing /home/sanchit_huggingface_co/.cache/huggingface/metrics/wer/default/default_experiment-1-0.arrow +{'eval_loss': 0.3589690625667572, 'eval_wer': 0.09641015470051567, 'eval_runtime': 570.6282, 'eval_samples_per_second': 4.63, 'eval_steps_per_second': 0.58, 'epoch': 8.97} +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...