diff --git "a/wandb/run-20220326_171130-bdf5nvyg/files/output.log" "b/wandb/run-20220326_171130-bdf5nvyg/files/output.log" --- "a/wandb/run-20220326_171130-bdf5nvyg/files/output.log" +++ "b/wandb/run-20220326_171130-bdf5nvyg/files/output.log" @@ -24651,3 +24651,2707 @@ [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +03/27/2022 06:09:26 - WARNING - huggingface_hub.repository - Adding files tracked by Git LFS: ['wandb/run-20220326_171130-bdf5nvyg/logs/debug-internal.log']. This may take a bit of time if the files are large. +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:09:57,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:09:57,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:09:57,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:09:57,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:10:05,598 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:10:05,598 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:10:09,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:10:09,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:10:13,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:10:13,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0345, 'learning_rate': 4.040462427745664e-05, 'epoch': 8.97} +[WARNING|modeling_utils.py:388] 2022-03-27 06:10:17,727 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:10:17,727 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:10:17,727 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:10:24,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:10:24,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:10:24,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:10:24,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:10:32,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:10:32,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|████████████████████████████████████████████████████████████████▋ | 2002/2230 [12:59:02<10:30:50, 166.01s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|████████████████████████████████████████████████████████████████▋ | 2002/2230 [12:59:02<10:30:50, 166.01s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:10:38,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:10:40,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:10:40,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:10:40,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:10:46,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:10:48,980 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:10:48,980 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:10:52,698 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:10:52,698 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0374, 'learning_rate': 4.005780346820808e-05, 'epoch': 8.98} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:10:56,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:10:59,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:01,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:03,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:05,044 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:06,991 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:08,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:08,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:10,842 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:12,599 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:14,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:15,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:19,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:20,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:20,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:22,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:25,263 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:26,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:29,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:30,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:32,941 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:32,941 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:35,222 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:37,035 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:39,845 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:41,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:41,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0245, 'learning_rate': 3.936416184971098e-05, 'epoch': 9.0} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:45,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:45,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:49,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:49,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:52,978 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:52,978 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:56,748 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:11:56,748 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:00,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:04,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:04,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:08,243 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:08,243 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:11,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:11,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:11,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0298, 'learning_rate': 3.901734104046242e-05, 'epoch': 9.01} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0394, 'learning_rate': 3.884393063583814e-05, 'epoch': 9.01} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0377, 'learning_rate': 3.867052023121387e-05, 'epoch': 9.02} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0317, 'learning_rate': 3.849710982658959e-05, 'epoch': 9.02} + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0363, 'learning_rate': 3.832369942196531e-05, 'epoch': 9.03} + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2014/2230 [13:03:28<1:43:01, 28.62s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2014/2230 [13:03:28<1:43:01, 28.62s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0391, 'learning_rate': 3.815028901734104e-05, 'epoch': 9.03} + 90%|██████████████████████████████████████████████████████████████████▊ | 2014/2230 [13:03:28<1:43:01, 28.62s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2014/2230 [13:03:28<1:43:01, 28.62s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2014/2230 [13:03:28<1:43:01, 28.62s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2014/2230 [13:03:28<1:43:01, 28.62s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2014/2230 [13:03:28<1:43:01, 28.62s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2014/2230 [13:03:28<1:43:01, 28.62s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2014/2230 [13:03:28<1:43:01, 28.62s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2014/2230 [13:03:28<1:43:01, 28.62s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2014/2230 [13:03:28<1:43:01, 28.62s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2014/2230 [13:03:28<1:43:01, 28.62s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2014/2230 [13:03:28<1:43:01, 28.62s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0324, 'learning_rate': 3.797687861271676e-05, 'epoch': 9.04} + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|████████████████████████████████████████████████��█████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.042, 'learning_rate': 3.780346820809248e-05, 'epoch': 9.04} + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.029, 'learning_rate': 3.76300578034682e-05, 'epoch': 9.04} + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████���██████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0427, 'learning_rate': 3.745664739884393e-05, 'epoch': 9.05} + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0366, 'learning_rate': 3.728323699421965e-05, 'epoch': 9.05} + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0306, 'learning_rate': 3.710982658959537e-05, 'epoch': 9.06} + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0419, 'learning_rate': 3.6936416184971096e-05, 'epoch': 9.06} + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████���████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0289, 'learning_rate': 3.6763005780346816e-05, 'epoch': 9.07} + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|█████████████████████████████���█████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0274, 'learning_rate': 3.6416184971098265e-05, 'epoch': 9.08} + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2688, 'learning_rate': 3.6242774566473985e-05, 'epoch': 9.08} + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|█��█████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0241, 'learning_rate': 3.6069364161849706e-05, 'epoch': 9.09} + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████��███████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0307, 'learning_rate': 3.5895953757225427e-05, 'epoch': 9.09} + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2028/2230 [13:09:32<1:25:41, 25.45s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2028/2230 [13:09:32<1:25:41, 25.45s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0317, 'learning_rate': 3.5722543352601154e-05, 'epoch': 9.09} + 91%|███████████████████████████████████████████████████████████████████▎ | 2028/2230 [13:09:32<1:25:41, 25.45s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2028/2230 [13:09:32<1:25:41, 25.45s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2028/2230 [13:09:32<1:25:41, 25.45s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2028/2230 [13:09:32<1:25:41, 25.45s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2028/2230 [13:09:32<1:25:41, 25.45s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2028/2230 [13:09:32<1:25:41, 25.45s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2028/2230 [13:09:32<1:25:41, 25.45s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0328, 'learning_rate': 3.5549132947976875e-05, 'epoch': 9.1} +[WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0224, 'learning_rate': 3.520231213872832e-05, 'epoch': 9.11} + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0383, 'learning_rate': 3.5028901734104043e-05, 'epoch': 9.11} + 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0319, 'learning_rate': 3.4855491329479764e-05, 'epoch': 9.12} + 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:16,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:16,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:16,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:16,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:16,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:16,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:16,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0283, 'learning_rate': 3.4682080924855485e-05, 'epoch': 9.12} +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0184, 'learning_rate': 3.450867052023121e-05, 'epoch': 9.13} +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0273, 'learning_rate': 3.433526011560693e-05, 'epoch': 9.13} + g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0288, 'learning_rate': 3.4161849710982654e-05, 'epoch': 9.13} +[WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▋ | 2038/2230 [13:13:32<1:15:42, 23.66s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▋ | 2038/2230 [13:13:32<1:15:42, 23.66s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0342, 'learning_rate': 3.398843930635838e-05, 'epoch': 9.14} + 91%|███████████████████████████████████████████████████████████████████▋ | 2038/2230 [13:13:32<1:15:42, 23.66s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▋ | 2038/2230 [13:13:32<1:15:42, 23.66s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▋ | 2038/2230 [13:13:32<1:15:42, 23.66s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▋ | 2038/2230 [13:13:32<1:15:42, 23.66s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▋ | 2038/2230 [13:13:32<1:15:42, 23.66s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▋ | 2038/2230 [13:13:32<1:15:42, 23.66s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|█████████████████████████████████████��█████████████████████████████▋ | 2038/2230 [13:13:32<1:15:42, 23.66s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▋ | 2038/2230 [13:13:32<1:15:42, 23.66s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▋ | 2038/2230 [13:13:32<1:15:42, 23.66s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0355, 'learning_rate': 3.38150289017341e-05, 'epoch': 9.14} + 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0316, 'learning_rate': 3.364161849710982e-05, 'epoch': 9.15} + 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:00,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:00,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:00,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:00,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:00,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:00,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:00,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0263, 'learning_rate': 3.346820809248554e-05, 'epoch': 9.15} +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:00,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:00,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:18,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:18,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:18,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:25,390 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:25,390 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:29,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:29,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:29,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0373, 'learning_rate': 3.329479768786127e-05, 'epoch': 9.16} +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:29,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:37,189 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:37,189 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:37,189 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:37,189 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:37,189 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:47,542 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:47,542 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:47,542 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:47,542 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:47,542 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0224, 'learning_rate': 3.312138728323699e-05, 'epoch': 9.16} +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:57,914 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:57,914 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:57,914 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:57,914 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:57,914 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:57,914 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:57,914 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:26:57,914 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|███████████████████████████████████████████████████████████████████▊ | 2044/2230 [13:15:41<1:06:46, 21.54s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|███████████████████████████████████████████████████████████████████▊ | 2044/2230 [13:15:41<1:06:46, 21.54s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.029, 'learning_rate': 3.294797687861271e-05, 'epoch': 9.17} + 92%|███████████████████████████████████████████████████████████████████▊ | 2044/2230 [13:15:41<1:06:46, 21.54s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:27:20,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:27:20,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:27:24,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:27:24,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:27:24,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:27:24,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:27:32,407 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:27:32,407 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:27:32,407 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:27:36,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:27:36,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:27:40,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:27:40,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:27:45,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:27:45,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:27:49,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:27:49,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:27:49,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|███████████████████████████████████████████████████████████████████▉ | 2046/2230 [13:16:20<1:02:35, 20.41s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:27:55,042 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:27:57,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:27:57,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:28:01,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:28:03,761 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:28:03,761 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:28:07,537 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:28:09,772 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:28:09,772 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0351, 'learning_rate': 3.242774566473988e-05, 'epoch': 9.18} +[WARNING|modeling_utils.py:388] 2022-03-27 06:28:13,113 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:28:15,268 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:28:17,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:28:19,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:28:19,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:28:23,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:28:25,518 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:28:27,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:28:27,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:28:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:28:31,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:28:33,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:28:35,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:28:37,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:28:39,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:28:41,546 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:28:43,395 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:28:43,395 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:28:45,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:28:47,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:28:49,023 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:28:52,598 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:28:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:28:56,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:28:57,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:28:57,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:00,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:02,410 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:04,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:05,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:07,482 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:09,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:12,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:12,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:14,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:15,627 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:17,209 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:20,279 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:21,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:24,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:24,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:26,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:27,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:30,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:32,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:34,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:34,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:36,880 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:38,118 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:40,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:42,854 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:45,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:45,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:46,333 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:48,497 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:50,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:52,598 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:52,598 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:54,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:57,432 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:29:59,285 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:01,107 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:01,107 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:02,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:05,472 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:07,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:08,482 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:08,482 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0711, 'learning_rate': 3.069364161849711e-05, 'epoch': 9.22} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:11,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:11,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:15,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:19,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:19,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:22,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:22,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:26,654 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:26,654 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:30,225 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:30,225 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:33,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:37,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:37,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0487, 'learning_rate': 3.052023121387283e-05, 'epoch': 9.23} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:41,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:41,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:44,653 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:44,653 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:48,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:51,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:51,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:55,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:55,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:30:58,823 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:02,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:02,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:02,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:05,880 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:05,880 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:09,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:09,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:13,042 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:16,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:16,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:20,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:20,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:23,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:27,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:27,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:30,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:30,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:30,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:34,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:37,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:37,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:41,110 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:41,110 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:44,543 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:47,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:47,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:51,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:51,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0409, 'learning_rate': 2.9999999999999997e-05, 'epoch': 9.24} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.038, 'learning_rate': 2.982658959537572e-05, 'epoch': 9.25} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0301, 'learning_rate': 2.9653179190751446e-05, 'epoch': 9.25} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0407, 'learning_rate': 2.9479768786127166e-05, 'epoch': 9.26} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0388, 'learning_rate': 2.930635838150289e-05, 'epoch': 9.26} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0413, 'learning_rate': 2.9132947976878608e-05, 'epoch': 9.26} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0317, 'learning_rate': 2.895953757225433e-05, 'epoch': 9.27} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0397, 'learning_rate': 2.8786127167630052e-05, 'epoch': 9.27} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0456, 'learning_rate': 2.8612716763005776e-05, 'epoch': 9.28} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0348, 'learning_rate': 2.8439306358381497e-05, 'epoch': 9.28} + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0273, 'learning_rate': 2.826589595375722e-05, 'epoch': 9.29} + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████��████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0381, 'learning_rate': 2.8092485549132945e-05, 'epoch': 9.29} + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████��██████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0338, 'learning_rate': 2.7919075144508666e-05, 'epoch': 9.3} + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0308, 'learning_rate': 2.774566473988439e-05, 'epoch': 9.3} + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|██████████████���█████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0325, 'learning_rate': 2.757225433526011e-05, 'epoch': 9.3} + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0207, 'learning_rate': 2.7398843930635835e-05, 'epoch': 9.31} + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0315, 'learning_rate': 2.722543352601156e-05, 'epoch': 9.31} + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0329, 'learning_rate': 2.705202312138728e-05, 'epoch': 9.32} + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████���████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▉ | 2079/2230 [13:28:22<1:03:29, 25.23s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▉ | 2079/2230 [13:28:22<1:03:29, 25.23s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0331, 'learning_rate': 2.6878612716763003e-05, 'epoch': 9.32} + 93%|████████████████████████████████████████████████████████████████████▉ | 2079/2230 [13:28:22<1:03:29, 25.23s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▉ | 2079/2230 [13:28:22<1:03:29, 25.23s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▉ | 2079/2230 [13:28:22<1:03:29, 25.23s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▉ | 2079/2230 [13:28:22<1:03:29, 25.23s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▉ | 2079/2230 [13:28:22<1:03:29, 25.23s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▉ | 2079/2230 [13:28:22<1:03:29, 25.23s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▉ | 2079/2230 [13:28:22<1:03:29, 25.23s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▉ | 2079/2230 [13:28:22<1:03:29, 25.23s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▉ | 2079/2230 [13:28:22<1:03:29, 25.23s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▉ | 2079/2230 [13:28:22<1:03:29, 25.23s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2080/2230 [13:28:47<1:02:39, 25.06s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2080/2230 [13:28:47<1:02:39, 25.06s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0251, 'learning_rate': 2.6705202312138724e-05, 'epoch': 9.33} + 93%|█████████████████████████████████████████████████████████████████████ | 2080/2230 [13:28:47<1:02:39, 25.06s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2080/2230 [13:28:47<1:02:39, 25.06s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2080/2230 [13:28:47<1:02:39, 25.06s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2080/2230 [13:28:47<1:02:39, 25.06s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2080/2230 [13:28:47<1:02:39, 25.06s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2080/2230 [13:28:47<1:02:39, 25.06s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2080/2230 [13:28:47<1:02:39, 25.06s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2080/2230 [13:28:47<1:02:39, 25.06s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2080/2230 [13:28:47<1:02:39, 25.06s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2080/2230 [13:28:47<1:02:39, 25.06s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2081/2230 [13:29:12<1:01:51, 24.91s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2081/2230 [13:29:12<1:01:51, 24.91s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0311, 'learning_rate': 2.6531791907514448e-05, 'epoch': 9.33} + 93%|█████████████████████████████████████████████████████████████████████ | 2081/2230 [13:29:12<1:01:51, 24.91s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2081/2230 [13:29:12<1:01:51, 24.91s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2081/2230 [13:29:12<1:01:51, 24.91s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2081/2230 [13:29:12<1:01:51, 24.91s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2081/2230 [13:29:12<1:01:51, 24.91s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2081/2230 [13:29:12<1:01:51, 24.91s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2081/2230 [13:29:12<1:01:51, 24.91s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2081/2230 [13:29:12<1:01:51, 24.91s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2081/2230 [13:29:12<1:01:51, 24.91s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2081/2230 [13:29:12<1:01:51, 24.91s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0378, 'learning_rate': 2.635838150289017e-05, 'epoch': 9.34} + 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0282, 'learning_rate': 2.6184971098265893e-05, 'epoch': 9.34} + 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:41:52,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:41:52,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:41:52,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:41:52,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0314, 'learning_rate': 2.6011560693641617e-05, 'epoch': 9.35} +[WARNING|modeling_utils.py:388] 2022-03-27 06:41:52,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:41:52,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:41:52,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:41:52,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:41:52,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:41:52,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:41:52,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:41:52,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:41:52,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0311, 'learning_rate': 2.5838150289017338e-05, 'epoch': 9.35} +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.034, 'learning_rate': 2.5664739884393062e-05, 'epoch': 9.35} +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0324, 'learning_rate': 2.5491329479768782e-05, 'epoch': 9.36} +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0358, 'learning_rate': 2.5317919075144507e-05, 'epoch': 9.36} +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0237, 'learning_rate': 2.514450867052023e-05, 'epoch': 9.37} +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0342, 'learning_rate': 2.497109826589595e-05, 'epoch': 9.37} +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:44:25,368 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:44:25,368 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:44:25,368 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:44:25,368 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:44:25,368 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:44:25,368 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▎ | 2091/2230 [13:33:05<52:30, 22.67s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▎ | 2091/2230 [13:33:05<52:30, 22.67s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0208, 'learning_rate': 2.4797687861271675e-05, 'epoch': 9.38} + 94%|███████████████████████████████████████████████████████████████████████▎ | 2091/2230 [13:33:05<52:30, 22.67s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▎ | 2091/2230 [13:33:05<52:30, 22.67s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▎ | 2091/2230 [13:33:05<52:30, 22.67s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▎ | 2091/2230 [13:33:05<52:30, 22.67s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▎ | 2091/2230 [13:33:05<52:30, 22.67s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▎ | 2091/2230 [13:33:05<52:30, 22.67s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|██████████████████████████████████████████████████████████████████████��▎ | 2091/2230 [13:33:05<52:30, 22.67s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▎ | 2091/2230 [13:33:05<52:30, 22.67s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:44:58,273 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:44:58,273 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:44:58,273 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:45:02,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:45:02,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:45:02,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:45:02,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:45:02,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:45:02,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:45:02,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:45:02,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:45:18,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:45:18,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0341, 'learning_rate': 2.445086705202312e-05, 'epoch': 9.39} +[WARNING|modeling_utils.py:388] 2022-03-27 06:45:18,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:45:18,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:45:18,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:45:28,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:45:28,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:45:28,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:45:28,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:45:28,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:45:28,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:45:28,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:45:28,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0236, 'learning_rate': 2.427745664739884e-05, 'epoch': 9.39} +[WARNING|modeling_utils.py:388] 2022-03-27 06:45:28,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:45:47,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:45:47,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:45:47,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:45:47,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:45:55,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:45:55,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:45:59,582 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▍ | 2095/2230 [13:34:29<47:50, 21.26s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▍ | 2095/2230 [13:34:29<47:50, 21.26s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0257, 'learning_rate': 2.4104046242774565e-05, 'epoch': 9.39} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:46:05,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:46:05,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:46:09,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:46:09,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:46:09,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:46:09,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:46:09,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:46:19,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:46:19,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:46:21,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:46:21,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:46:21,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:46:27,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:46:29,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:46:29,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:46:34,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:46:34,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:46:37,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:46:37,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0304, 'learning_rate': 2.375722543352601e-05, 'epoch': 9.4} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:46:42,203 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:46:44,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:46:46,675 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:46:48,903 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:46:51,074 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:46:51,074 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:46:54,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:46:54,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:46:54,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:46:58,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:00,805 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:02,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:04,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:06,998 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:09,051 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:11,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:13,058 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:13,058 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:15,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:17,067 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:19,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:20,934 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:22,844 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:24,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:26,555 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:28,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:28,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:31,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:33,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:34,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:36,644 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:38,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:41,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:41,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:43,523 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:45,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:46,919 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:50,135 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:51,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:53,293 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:56,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:56,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:57,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:47:59,442 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:02,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:03,719 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:06,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:07,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:07,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:10,501 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:11,816 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:14,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:15,560 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:17,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:17,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:20,364 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:22,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:24,687 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:26,696 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:26,696 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:28,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:30,601 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:32,502 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:34,296 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:34,296 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:36,158 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:38,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:40,851 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:41,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:41,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:45,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:45,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:48,664 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:48,664 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:52,332 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:52,332 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:55,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:55,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:48:59,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:03,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:03,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:06,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:06,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:10,263 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:10,263 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0482, 'learning_rate': 2.184971098265896e-05, 'epoch': 9.45} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:13,949 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:17,528 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:17,528 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:21,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:21,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:24,593 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:28,119 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:28,119 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:31,649 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:31,649 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:35,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:35,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:38,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:38,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:42,230 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:42,230 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:45,768 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:45,768 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:49,177 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:52,675 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:52,675 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:56,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:56,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:49:59,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:03,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:03,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:06,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:06,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0355, 'learning_rate': 2.1502890173410405e-05, 'epoch': 9.46} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:10,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:10,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:13,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:17,010 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:17,010 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0358, 'learning_rate': 2.1329479768786126e-05, 'epoch': 9.47} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0419, 'learning_rate': 2.115606936416185e-05, 'epoch': 9.47} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0411, 'learning_rate': 2.098265895953757e-05, 'epoch': 9.48} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0319, 'learning_rate': 2.080924855491329e-05, 'epoch': 9.48} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0288, 'learning_rate': 2.0635838150289012e-05, 'epoch': 9.48} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0269, 'learning_rate': 2.0462427745664736e-05, 'epoch': 9.49} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0355, 'learning_rate': 2.028901734104046e-05, 'epoch': 9.49} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.032, 'learning_rate': 2.011560693641618e-05, 'epoch': 9.5} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0251, 'learning_rate': 1.9942196531791905e-05, 'epoch': 9.5} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0295, 'learning_rate': 1.9768786127167626e-05, 'epoch': 9.51} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0405, 'learning_rate': 1.959537572254335e-05, 'epoch': 9.51} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0289, 'learning_rate': 1.942196531791907e-05, 'epoch': 9.52} + 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|█████████████████████████████████████████████████████████��██████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0337, 'learning_rate': 1.9248554913294795e-05, 'epoch': 9.52} + 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|███████████████████████████████████████████████████████████████████████��▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.035, 'learning_rate': 1.907514450867052e-05, 'epoch': 9.52} + 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.029, 'learning_rate': 1.890173410404624e-05, 'epoch': 9.53} + 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.03, 'learning_rate': 1.8728323699421963e-05, 'epoch': 9.53} + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0287, 'learning_rate': 1.8554913294797684e-05, 'epoch': 9.54} + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0354, 'learning_rate': 1.8381502890173408e-05, 'epoch': 9.54} + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████��███████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0262, 'learning_rate': 1.8208092485549132e-05, 'epoch': 9.55} + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|██████████████████��█████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0225, 'learning_rate': 1.8034682080924853e-05, 'epoch': 9.55} + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████��███████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0288, 'learning_rate': 1.7861271676300577e-05, 'epoch': 9.56} + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|██████████████████████████████████████████████��█████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:59:26,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:59:26,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:59:26,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:59:26,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:59:26,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:59:26,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:59:26,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 06:59:26,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0241, 'learning_rate': 1.7687861271676298e-05, 'epoch': 9.56} +[WARNING|modeling_utils.py:388] 2022-03-27 06:59:26,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0277, 'learning_rate': 1.7514450867052022e-05, 'epoch': 9.57} +[WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▋ | 2134/2230 [13:48:55<38:54, 24.32s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▋ | 2134/2230 [13:48:55<38:54, 24.32s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0298, 'learning_rate': 1.7341040462427742e-05, 'epoch': 9.57} + 96%|████████████████████████████████████████████████████████████████████████▋ | 2134/2230 [13:48:55<38:54, 24.32s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▋ | 2134/2230 [13:48:55<38:54, 24.32s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▋ | 2134/2230 [13:48:55<38:54, 24.32s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▋ | 2134/2230 [13:48:55<38:54, 24.32s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▋ | 2134/2230 [13:48:55<38:54, 24.32s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▋ | 2134/2230 [13:48:55<38:54, 24.32s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▋ | 2134/2230 [13:48:55<38:54, 24.32s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:00:46,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:00:46,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:00:46,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:00:46,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0278, 'learning_rate': 1.7167630057803466e-05, 'epoch': 9.57} +[WARNING|modeling_utils.py:388] 2022-03-27 07:00:46,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:00:46,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:00:46,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:00:46,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:00:46,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:00:46,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:07,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:07,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:07,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:07,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▊ | 2136/2230 [13:49:42<37:27, 23.91s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▊ | 2136/2230 [13:49:42<37:27, 23.91s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0253, 'learning_rate': 1.699421965317919e-05, 'epoch': 9.58} + 96%|████████████████████████████████████████████████████████████████████████▊ | 2136/2230 [13:49:42<37:27, 23.91s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▊ | 2136/2230 [13:49:42<37:27, 23.91s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:23,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:23,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:23,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:23,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:23,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:23,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:23,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:23,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:23,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0331, 'learning_rate': 1.682080924855491e-05, 'epoch': 9.58} +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:23,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:23,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:23,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0272, 'learning_rate': 1.6647398843930635e-05, 'epoch': 9.59} +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:02:23,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:02:23,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:02:23,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0237, 'learning_rate': 1.6473988439306356e-05, 'epoch': 9.59} +[WARNING|modeling_utils.py:388] 2022-03-27 07:02:23,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:02:23,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:02:23,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:02:23,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:02:23,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:02:23,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:02:23,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:02:23,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:02:23,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:02:23,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.027, 'learning_rate': 1.630057803468208e-05, 'epoch': 9.6} +[WARNING|modeling_utils.py:388] 2022-03-27 07:02:49,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:02:49,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:02:53,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:02:53,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:02:53,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:02:53,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:02:53,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:02:53,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:02:53,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:03:08,171 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:03:08,171 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.026, 'learning_rate': 1.61271676300578e-05, 'epoch': 9.6} +[WARNING|modeling_bart.py:1051] 2022-03-27 07:03:12,281 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:03:12,281 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:03:16,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:03:16,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:03:16,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:03:16,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:03:24,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:03:24,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:03:24,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████ | 2142/2230 [13:51:58<32:47, 22.36s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████ | 2142/2230 [13:51:58<32:47, 22.36s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0281, 'learning_rate': 1.5953757225433525e-05, 'epoch': 9.61} + 96%|█████████████████████████████████████████████████████████████████████████ | 2142/2230 [13:51:58<32:47, 22.36s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████ | 2142/2230 [13:51:58<32:47, 22.36s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████ | 2142/2230 [13:51:58<32:47, 22.36s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████ | 2142/2230 [13:51:58<32:47, 22.36s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████ | 2142/2230 [13:51:58<32:47, 22.36s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████ | 2142/2230 [13:51:58<32:47, 22.36s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████ | 2142/2230 [13:51:58<32:47, 22.36s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████ | 2142/2230 [13:51:58<32:47, 22.36s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████ | 2142/2230 [13:51:58<32:47, 22.36s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████ | 2142/2230 [13:51:58<32:47, 22.36s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:03:52,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:03:52,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:03:56,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:03:56,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:03:56,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:04:03,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:04:03,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:04:03,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:04:03,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:04:03,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████ | 2144/2230 [13:52:41<31:22, 21.89s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████ | 2144/2230 [13:52:41<31:22, 21.89s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0188, 'learning_rate': 1.560693641618497e-05, 'epoch': 9.61} + 96%|█████████████████████████████████████████████████████████████████████████ | 2144/2230 [13:52:41<31:22, 21.89s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:04:19,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:04:19,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:04:19,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:04:25,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:04:25,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:04:25,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:04:31,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████ | 2145/2230 [13:53:01<30:18, 21.39s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████ | 2145/2230 [13:53:01<30:18, 21.39s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:04:35,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:04:35,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:04:40,199 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:04:40,199 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:04:40,199 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:04:46,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:04:46,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:04:50,353 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:04:50,353 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:04:50,353 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:04:50,353 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:04:56,010 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:04:56,010 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:05:00,010 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:05:02,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:05:02,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:05:02,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:05:08,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:05:10,533 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:05:10,533 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:05:10,533 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:05:14,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:05:14,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:05:18,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:05:18,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:05:22,493 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:05:24,678 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:05:26,854 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:05:28,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:05:28,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:05:31,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:05:33,289 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:05:35,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:05:37,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:05:39,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:05:41,587 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:05:43,628 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:05:45,629 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:05:45,629 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:05:47,706 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:05:49,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:05:51,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:05:53,521 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:05:55,445 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:05:57,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:05:59,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:01,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:01,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:03,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:05,798 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:07,601 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:09,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:11,101 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:14,561 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:16,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:16,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:18,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:19,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:21,324 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:22,950 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:26,134 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:27,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:32,324 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:33,789 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:35,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:37,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:39,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:39,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:42,177 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:43,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:47,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:48,472 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:50,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:50,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:53,290 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:54,412 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:56,599 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:06:58,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:00,844 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:00,844 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:02,779 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:04,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:06,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:08,429 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:08,429 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:10,882 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:12,463 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:14,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:14,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.056, 'learning_rate': 1.3352601156069362e-05, 'epoch': 9.67} +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:18,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:18,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:21,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:25,308 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:25,308 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:28,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:28,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:32,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:32,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:36,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:39,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:39,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:43,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:43,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:43,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:47,010 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:47,010 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:50,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:54,109 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:54,109 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:57,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:07:57,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:01,203 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:01,203 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:04,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:08,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:08,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:11,650 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:11,650 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0395, 'learning_rate': 1.3005780346820809e-05, 'epoch': 9.68} +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:15,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:18,750 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:18,750 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:22,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:22,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:25,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:25,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:29,181 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:32,612 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:32,612 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:36,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:36,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:36,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:39,473 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:42,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:42,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:46,418 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:46,418 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:49,816 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0302, 'learning_rate': 1.2658959537572253e-05, 'epoch': 9.69} +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.037, 'learning_rate': 1.2485549132947976e-05, 'epoch': 9.7} +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0276, 'learning_rate': 1.2312138728323698e-05, 'epoch': 9.7} +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0316, 'learning_rate': 1.213872832369942e-05, 'epoch': 9.7} +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0344, 'learning_rate': 1.1965317919075144e-05, 'epoch': 9.71} +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0365, 'learning_rate': 1.1791907514450867e-05, 'epoch': 9.71} +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0248, 'learning_rate': 1.161849710982659e-05, 'epoch': 9.72} +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.027, 'learning_rate': 1.1445086705202312e-05, 'epoch': 9.72} + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|████████████████████████████���████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0345, 'learning_rate': 1.1271676300578034e-05, 'epoch': 9.73} + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████���██████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0264, 'learning_rate': 1.1098265895953756e-05, 'epoch': 9.73} + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|████████████████████████████████████████���████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0279, 'learning_rate': 1.092485549132948e-05, 'epoch': 9.74} + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████��██████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0295, 'learning_rate': 1.0751445086705203e-05, 'epoch': 9.74} + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|████████████████████████████████████████████████████��████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.034, 'learning_rate': 1.0578034682080925e-05, 'epoch': 9.74} + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0232, 'learning_rate': 1.0404624277456646e-05, 'epoch': 9.75} + 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.028, 'learning_rate': 1.0231213872832368e-05, 'epoch': 9.75} + 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|███████████████████████████████████████████████████████████████████████���██ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0351, 'learning_rate': 1.005780346820809e-05, 'epoch': 9.76} + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|████████████████████████████████████████████████████████████████████████���█▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0379, 'learning_rate': 9.884393063583813e-06, 'epoch': 9.76} + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████��███▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0293, 'learning_rate': 9.710982658959535e-06, 'epoch': 9.77} + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0337, 'learning_rate': 9.53757225433526e-06, 'epoch': 9.77} + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████��██████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [14:05:50<20:43, 24.87s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [14:05:50<20:43, 24.87s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0362, 'learning_rate': 9.364161849710982e-06, 'epoch': 9.78} + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [14:05:50<20:43, 24.87s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [14:05:50<20:43, 24.87s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [14:05:50<20:43, 24.87s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [14:05:50<20:43, 24.87s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [14:05:50<20:43, 24.87s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [14:05:50<20:43, 24.87s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [14:05:50<20:43, 24.87s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [14:05:50<20:43, 24.87s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [14:05:50<20:43, 24.87s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [14:05:50<20:43, 24.87s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2181/2230 [14:06:15<20:11, 24.73s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2181/2230 [14:06:15<20:11, 24.73s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0274, 'learning_rate': 9.190751445086704e-06, 'epoch': 9.78} + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2181/2230 [14:06:15<20:11, 24.73s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2181/2230 [14:06:15<20:11, 24.73s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2181/2230 [14:06:15<20:11, 24.73s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2181/2230 [14:06:15<20:11, 24.73s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2181/2230 [14:06:15<20:11, 24.73s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2181/2230 [14:06:15<20:11, 24.73s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|████████████████████████████████████████████████████████████████���█████████▎ | 2181/2230 [14:06:15<20:11, 24.73s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2181/2230 [14:06:15<20:11, 24.73s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2181/2230 [14:06:15<20:11, 24.73s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2181/2230 [14:06:15<20:11, 24.73s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0325, 'learning_rate': 9.017341040462426e-06, 'epoch': 9.78} + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████��███████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0346, 'learning_rate': 8.843930635838149e-06, 'epoch': 9.79} + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0263, 'learning_rate': 8.670520231213871e-06, 'epoch': 9.79} + Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.035, 'learning_rate': 8.497109826589595e-06, 'epoch': 9.8} +[WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:08:14<17:28, 23.84s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:08:14<17:28, 23.84s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0327, 'learning_rate': 8.323699421965318e-06, 'epoch': 9.8} + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:08:14<17:28, 23.84s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:08:14<17:28, 23.84s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:08:14<17:28, 23.84s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:08:14<17:28, 23.84s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|█████████████████████████████████████████████████████████████��████████████▌ | 2186/2230 [14:08:14<17:28, 23.84s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:08:14<17:28, 23.84s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:08:14<17:28, 23.84s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:08:14<17:28, 23.84s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:08:14<17:28, 23.84s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0339, 'learning_rate': 8.15028901734104e-06, 'epoch': 9.81} + Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0233, 'learning_rate': 7.976878612716762e-06, 'epoch': 9.81} + Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.028, 'learning_rate': 7.803468208092485e-06, 'epoch': 9.82} +[WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0294, 'learning_rate': 7.630057803468207e-06, 'epoch': 9.82} +[WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:21:26,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:21:26,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:21:30,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:21:30,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:21:30,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:21:30,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:21:30,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:21:30,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:21:30,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0224, 'learning_rate': 7.45664739884393e-06, 'epoch': 9.83} +[WARNING|modeling_bart.py:1051] 2022-03-27 07:21:44,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:21:44,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:21:44,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:21:50,182 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:21:50,182 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:21:50,182 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:21:50,182 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:21:50,182 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:21:50,182 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:21:50,182 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0323, 'learning_rate': 7.283236994219652e-06, 'epoch': 9.83} +[WARNING|modeling_utils.py:388] 2022-03-27 07:22:04,775 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:22:04,775 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:22:08,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:22:08,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:22:08,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:22:08,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:22:16,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:22:16,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:22:21,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:22:21,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:22:21,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0337, 'learning_rate': 7.109826589595374e-06, 'epoch': 9.83} +[WARNING|modeling_utils.py:388] 2022-03-27 07:22:26,887 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:22:26,887 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:22:26,887 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:22:33,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:22:33,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:22:33,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:22:33,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:22:33,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:22:33,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▊ | 2194/2230 [14:11:11<13:00, 21.69s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:22:45,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:22:45,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:22:45,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:22:45,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:22:54,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:22:54,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:22:58,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:22:58,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:23:01,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:23:01,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.013, 'learning_rate': 6.76300578034682e-06, 'epoch': 9.84} +[WARNING|modeling_utils.py:388] 2022-03-27 07:23:01,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:23:08,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:23:08,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:23:08,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:23:14,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:23:14,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:23:18,468 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:23:18,468 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:23:22,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:23:22,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0371, 'learning_rate': 6.589595375722542e-06, 'epoch': 9.85} +[WARNING|modeling_utils.py:388] 2022-03-27 07:23:22,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:23:28,409 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:23:28,409 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:23:32,628 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:23:34,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:23:34,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:23:38,810 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:23:41,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:23:41,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0245, 'learning_rate': 6.4161849710982654e-06, 'epoch': 9.85} +[WARNING|modeling_utils.py:388] 2022-03-27 07:23:41,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:23:46,740 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:23:48,930 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:23:51,078 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:23:53,193 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:23:55,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:23:57,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:23:57,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0334, 'learning_rate': 6.242774566473988e-06, 'epoch': 9.86} +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:01,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:03,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:05,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:07,252 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:09,204 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:11,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:13,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:13,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|██████████████████████████████████████████████████████████████████████████▉ | 2199/2230 [14:12:42<09:25, 18.24s/it][WARNING|modeling_bart.py:1051] 2022-03-27 07:24:15,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:16,948 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:15,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:18,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:15,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:20,724 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:15,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:22,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:15,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:24,397 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:15,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:28,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:15,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:28,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:15,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:28,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:15,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|██████████████████████████████████████████████████████████████████████████▉ | 2200/2230 [14:12:58<08:44, 17.48s/it][WARNING|modeling_bart.py:1051] 2022-03-27 07:24:30,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:32,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:30,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:34,352 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:30,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:37,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:30,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:39,447 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:30,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:41,062 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:30,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████ | 2201/2230 [14:13:11<07:53, 16.34s/it] Setting `use_cache=False`...1] 2022-03-27 07:24:30,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████ | 2201/2230 [14:13:11<07:53, 16.34s/it] Setting `use_cache=False`...1] 2022-03-27 07:24:30,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:45,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:44,363 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:47,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:44,363 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:49,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:44,363 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:52,013 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:44,363 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:53,497 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:44,363 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████ | 2202/2230 [14:13:23<07:02, 15.09s/it] Setting `use_cache=False`...1] 2022-03-27 07:24:44,363 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████ | 2202/2230 [14:13:23<07:02, 15.09s/it] Setting `use_cache=False`...1] 2022-03-27 07:24:44,363 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:57,857 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:56,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:24:59,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:56,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:25:01,940 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:56,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:25:03,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:56,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:25:05,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:56,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:25:05,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:56,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████ | 2203/2230 [14:13:34<06:12, 13.81s/it][WARNING|modeling_bart.py:1051] 2022-03-27 07:25:07,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:25:09,704 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:07,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:25:12,073 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:07,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:25:14,364 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:07,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████ | 2204/2230 [14:13:44<05:25, 12.52s/it] Setting `use_cache=False`...1] 2022-03-27 07:25:07,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████ | 2204/2230 [14:13:44<05:25, 12.52s/it] Setting `use_cache=False`...1] 2022-03-27 07:25:07,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:25:17,768 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:16,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:25:19,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:16,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:25:21,978 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:16,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▏| 2205/2230 [14:13:52<04:41, 11.28s/it] Setting `use_cache=False`...1] 2022-03-27 07:25:16,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▏| 2205/2230 [14:13:52<04:41, 11.28s/it] Setting `use_cache=False`...1] 2022-03-27 07:25:16,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:25:25,989 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:25,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:25:28,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:25,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:25:30,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:25,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▏| 2206/2230 [14:14:00<04:02, 10.10s/it][WARNING|modeling_bart.py:1051] 2022-03-27 07:25:32,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▏| 2206/2230 [14:14:00<04:02, 10.10s/it][WARNING|modeling_bart.py:1051] 2022-03-27 07:25:32,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:25:34,778 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:32,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:25:36,364 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:32,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:25:38,593 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:32,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|██████████��████████████████████████████████████████████████████████████████▏| 2207/2230 [14:14:07<03:31, 9.21s/it][WARNING|modeling_bart.py:1051] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▏| 2207/2230 [14:14:07<03:31, 9.21s/it][WARNING|modeling_bart.py:1051] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0398, 'learning_rate': 4.682080924855491e-06, 'epoch': 9.9} +[WARNING|modeling_bart.py:1051] 2022-03-27 07:25:44,397 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:25:44,397 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:25:48,093 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:25:48,093 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:25:51,694 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:25:51,694 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:25:55,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:25:58,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:25:58,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:26:02,377 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:26:02,377 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:26:05,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2208/2230 [14:14:36<05:34, 15.21s/it] Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2208/2230 [14:14:36<05:34, 15.21s/it] Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|█████████████████████████████████████████████���█████████████████████████████▎| 2208/2230 [14:14:36<05:34, 15.21s/it][WARNING|modeling_bart.py:1051] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:26:13,056 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:26:13,056 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:26:16,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:26:16,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:26:19,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:26:23,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:26:23,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:26:26,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:26:26,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:26:30,180 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:26:33,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2209/2230 [14:15:03<06:37, 18.93s/it][WARNING|modeling_bart.py:1051] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2209/2230 [14:15:03<06:37, 18.93s/it][WARNING|modeling_bart.py:1051] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:26:40,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:26:44,006 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:26:44,006 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:26:47,429 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:26:47,429 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:26:50,845 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:26:54,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:26:54,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:26:57,671 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:26:57,671 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:27:01,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:27:01,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2210/2230 [14:15:31<07:09, 21.47s/it][WARNING|modeling_bart.py:1051] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2210/2230 [14:15:31<07:09, 21.47s/it][WARNING|modeling_bart.py:1051] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:27:07,882 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:27:07,882 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:27:11,270 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:27:14,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:27:14,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:27:17,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:27:17,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:27:21,304 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:27:24,653 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:27:24,653 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:27:28,051 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:27:28,051 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.035, 'learning_rate': 3.8150289017341036e-06, 'epoch': 9.92} + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0309, 'learning_rate': 3.641618497109826e-06, 'epoch': 9.92} + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0347, 'learning_rate': 3.4682080924855487e-06, 'epoch': 9.93} + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0296, 'learning_rate': 3.294797687861271e-06, 'epoch': 9.93} + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▌| 2216/2230 [14:18:08<05:53, 25.24s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▌| 2216/2230 [14:18:08<05:53, 25.24s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▌| 2216/2230 [14:18:08<05:53, 25.24s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▌| 2216/2230 [14:18:08<05:53, 25.24s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▌| 2216/2230 [14:18:08<05:53, 25.24s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▌| 2216/2230 [14:18:08<05:53, 25.24s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▌| 2216/2230 [14:18:08<05:53, 25.24s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▌| 2216/2230 [14:18:08<05:53, 25.24s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▌| 2216/2230 [14:18:08<05:53, 25.24s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▌| 2216/2230 [14:18:08<05:53, 25.24s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▌| 2216/2230 [14:18:08<05:53, 25.24s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▌| 2216/2230 [14:18:08<05:53, 25.24s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0381, 'learning_rate': 2.9479768786127167e-06, 'epoch': 9.94} + Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▌| 2218/2230 [14:18:57<04:59, 24.95s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▌| 2218/2230 [14:18:57<04:59, 24.95s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0363, 'learning_rate': 2.774566473988439e-06, 'epoch': 9.95} + 99%|███████████████████████████████████████████████████████████████████████████▌| 2218/2230 [14:18:57<04:59, 24.95s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▌| 2218/2230 [14:18:57<04:59, 24.95s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|███████████████████████████████████████████████████████████████████████████▌| 2218/2230 [14:18:57<04:59, 24.95s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:30:40,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:30:40,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:30:40,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:30:40,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:30:40,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:30:40,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:30:40,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|███████████████████████████████████████████████████████████████████████████▋| 2219/2230 [14:19:22<04:34, 25.00s/it]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|███████████████████████████████████████████████████████████████████████████▋| 2219/2230 [14:19:22<04:34, 25.00s/it]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0239, 'learning_rate': 2.6011560693641614e-06, 'epoch': 9.95} +100%|███████████████████████████████████████████████████████████████████████████▋| 2219/2230 [14:19:22<04:34, 25.00s/it]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|███████████████████████████████████████████████████████████████████████████▋| 2219/2230 [14:19:22<04:34, 25.00s/it]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|███████████████████████████████████████████████████████████████████████████▋| 2219/2230 [14:19:22<04:34, 25.00s/it]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|███████████████████████████████████████████████████████████████████████████▋| 2219/2230 [14:19:22<04:34, 25.00s/it]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|███████████████████████████████████████████████████████████████████████████▋| 2219/2230 [14:19:22<04:34, 25.00s/it]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|███████████████████████████████████████████████████████████████████████████▋| 2219/2230 [14:19:22<04:34, 25.00s/it]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|███████████████████████████████████████████████████████████████████████████▋| 2219/2230 [14:19:22<04:34, 25.00s/it]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0394, 'learning_rate': 2.427745664739884e-06, 'epoch': 9.96} +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0335, 'learning_rate': 2.2543352601156066e-06, 'epoch': 9.96} +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0213, 'learning_rate': 2.0809248554913294e-06, 'epoch': 9.96} +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:32:14,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:32:14,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:32:14,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:32:14,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:32:14,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:32:14,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:32:14,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:32:26,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:32:26,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:32:30,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:32:30,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:32:30,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:32:37,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:32:37,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:32:37,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:32:37,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:32:37,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:32:37,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:32:37,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.034, 'learning_rate': 1.7341040462427744e-06, 'epoch': 9.97} +[WARNING|modeling_utils.py:388] 2022-03-27 07:32:37,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:32:53,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:32:53,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:32:57,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:32:57,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:33:01,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:33:01,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:33:01,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:33:01,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:33:01,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0296, 'learning_rate': 1.560693641618497e-06, 'epoch': 9.98} +[WARNING|modeling_bart.py:1051] 2022-03-27 07:33:11,909 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:33:14,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:33:16,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-27 07:33:16,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:33:19,607 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:33:21,639 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:33:23,653 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:33:23,653 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:33:25,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:33:27,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:33:29,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:33:31,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:33:33,305 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:33:35,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:33:36,816 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:33:36,816 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:33:40,236 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:33:41,843 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:33:43,418 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:33:46,351 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:33:47,752 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:33:50,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:33:50,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:33:51,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:33:54,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:33:55,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:33:57,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:33:59,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:33:59,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:34:02,672 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:34:04,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:34:06,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-27 07:34:06,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0134, 'learning_rate': 6.936416184971098e-07, 'epoch': 10.0} +[INFO|configuration_utils.py:438] 2022-03-27 07:34:06,768 >> Configuration saved in ./config.jsons of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|configuration_utils.py:438] 2022-03-27 07:34:18,633 >> Configuration saved in ./config.jsons of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|configuration_utils.py:438] 2022-03-27 07:34:18,633 >> Configuration saved in ./config.jsons of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...