diff --git "a/wandb/run-20220327_210229-2wif55w7/files/output.log" "b/wandb/run-20220327_210229-2wif55w7/files/output.log" --- "a/wandb/run-20220327_210229-2wif55w7/files/output.log" +++ "b/wandb/run-20220327_210229-2wif55w7/files/output.log" @@ -24689,3 +24689,2709 @@ {'eval_loss': 4.259941577911377, 'eval_wer': 1.0190995636652123, 'eval_runtime': 524.0014, 'eval_samples_per_second': 5.042, 'eval_steps_per_second': 0.632, 'epoch': 8.97} [INFO|trainer.py:2369] 2022-03-28 09:48:34,975 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-28 09:48:34,975 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-28 09:48:34,975 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-28 09:48:34,975 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-28 09:48:34,975 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-28 09:48:34,975 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-28 09:48:34,975 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-28 09:48:34,975 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-28 09:48:34,975 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 09:59:49,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 09:59:49,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 09:59:49,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 09:59:55,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 09:59:55,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3549, 'learning_rate': 4.040462427745664e-06, 'epoch': 8.97} +[WARNING|modeling_utils.py:388] 2022-03-28 09:59:59,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 09:59:59,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 09:59:59,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 09:59:59,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:00:08,308 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:00:08,308 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:00:12,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:00:12,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:00:12,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:00:12,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|████████████████████████████████████████████████████████████████▋ | 2002/2230 [12:57:46<10:11:09, 160.83s/it] Setting `use_cache=False`...e computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:00:20,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:00:20,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:00:20,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:00:26,091 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:00:28,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:00:28,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:00:32,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:00:34,659 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 09:24:45,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|█████████████████████████████████████████████████████████████████▌ | 2003/2230 [12:58:05<7:27:11, 118.20s/it][WARNING|modeling_bart.py:1051] 2022-03-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|█████████████████████████████████████████████████████████████████▌ | 2003/2230 [12:58:05<7:27:11, 118.20s/it][WARNING|modeling_bart.py:1051] 2022-03-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2608, 'learning_rate': 4.005780346820809e-06, 'epoch': 8.98} +[WARNING|modeling_utils.py:388] 2022-03-28 10:00:40,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:00:42,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:00:44,794 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:00:46,749 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:00:48,693 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:00:50,560 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:00:52,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:00:52,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:00:54,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:00:56,281 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:00:58,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:00:59,846 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:02,984 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:04,480 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:05,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:05,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:08,804 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:10,125 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:12,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:15,073 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:16,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:16,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:17,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:20,379 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:23,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:24,801 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:24,801 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3191, 'learning_rate': 3.936416184971098e-06, 'epoch': 9.0} +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:29,259 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:29,259 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:33,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:33,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:36,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:36,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:40,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:40,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:47,911 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:51,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:51,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:55,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:55,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0172, 'learning_rate': 3.91907514450867e-06, 'epoch': 9.0} +[WARNING|modeling_utils.py:388] 2022-03-28 10:01:58,792 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:02:02,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:02:02,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:02:02,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:02:02,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:02:02,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:02:02,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:02:02,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:02:02,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:02:02,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:02:02,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:02:02,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:02:02,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8855, 'learning_rate': 3.901734104046242e-06, 'epoch': 9.01} +[WARNING|modeling_utils.py:388] 2022-03-28 10:02:02,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:02:02,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:02:02,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:02:02,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:02:02,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:02:02,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:02:02,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:02:02,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:02:02,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:02:02,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:02:02,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:02:02,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▋ | 2010/2230 [13:00:20<1:55:04, 31.38s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▋ | 2010/2230 [13:00:20<1:55:04, 31.38s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9422, 'learning_rate': 3.8843930635838145e-06, 'epoch': 9.01} + 90%|██████████████████████████████████████████████████████████████████▋ | 2010/2230 [13:00:20<1:55:04, 31.38s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▋ | 2010/2230 [13:00:20<1:55:04, 31.38s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▋ | 2010/2230 [13:00:20<1:55:04, 31.38s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|█████��████████████████████████████████████████████████████████████▋ | 2010/2230 [13:00:20<1:55:04, 31.38s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▋ | 2010/2230 [13:00:20<1:55:04, 31.38s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▋ | 2010/2230 [13:00:20<1:55:04, 31.38s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▋ | 2010/2230 [13:00:20<1:55:04, 31.38s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▋ | 2010/2230 [13:00:20<1:55:04, 31.38s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▋ | 2010/2230 [13:00:20<1:55:04, 31.38s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▋ | 2010/2230 [13:00:20<1:55:04, 31.38s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▋ | 2010/2230 [13:00:20<1:55:04, 31.38s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▋ | 2010/2230 [13:00:20<1:55:04, 31.38s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▋ | 2010/2230 [13:00:20<1:55:04, 31.38s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9771, 'learning_rate': 3.867052023121387e-06, 'epoch': 9.02} + 90%|██████████████████████████████████████████████████████████████████▋ | 2010/2230 [13:00:20<1:55:04, 31.38s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▋ | 2010/2230 [13:00:20<1:55:04, 31.38s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▋ | 2010/2230 [13:00:20<1:55:04, 31.38s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▋ | 2010/2230 [13:00:20<1:55:04, 31.38s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▋ | 2010/2230 [13:00:20<1:55:04, 31.38s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▋ | 2010/2230 [13:00:20<1:55:04, 31.38s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▋ | 2010/2230 [13:00:20<1:55:04, 31.38s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▋ | 2010/2230 [13:00:20<1:55:04, 31.38s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▋ | 2010/2230 [13:00:20<1:55:04, 31.38s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▋ | 2010/2230 [13:00:20<1:55:04, 31.38s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▋ | 2010/2230 [13:00:20<1:55:04, 31.38s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▋ | 2010/2230 [13:00:20<1:55:04, 31.38s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.995, 'learning_rate': 3.8497109826589594e-06, 'epoch': 9.02} + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.881, 'learning_rate': 3.8323699421965315e-06, 'epoch': 9.03} + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9283, 'learning_rate': 3.8150289017341036e-06, 'epoch': 9.03} + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8494, 'learning_rate': 3.797687861271676e-06, 'epoch': 9.04} + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8015, 'learning_rate': 3.7803468208092486e-06, 'epoch': 9.04} + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|████████████████████████████████████████████████████████████��█████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8063, 'learning_rate': 3.7630057803468206e-06, 'epoch': 9.04} + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:01:15<1:46:48, 29.40s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▉ | 2018/2230 [13:03:57<1:35:38, 27.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▉ | 2018/2230 [13:03:57<1:35:38, 27.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7155, 'learning_rate': 3.745664739884393e-06, 'epoch': 9.05} + 90%|██████████████████████████████████████████████████████████████████▉ | 2018/2230 [13:03:57<1:35:38, 27.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▉ | 2018/2230 [13:03:57<1:35:38, 27.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▉ | 2018/2230 [13:03:57<1:35:38, 27.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▉ | 2018/2230 [13:03:57<1:35:38, 27.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▉ | 2018/2230 [13:03:57<1:35:38, 27.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▉ | 2018/2230 [13:03:57<1:35:38, 27.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▉ | 2018/2230 [13:03:57<1:35:38, 27.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▉ | 2018/2230 [13:03:57<1:35:38, 27.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▉ | 2018/2230 [13:03:57<1:35:38, 27.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▉ | 2018/2230 [13:03:57<1:35:38, 27.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▉ | 2018/2230 [13:03:57<1:35:38, 27.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▉ | 2018/2230 [13:03:57<1:35:38, 27.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▉ | 2018/2230 [13:03:57<1:35:38, 27.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▉ | 2018/2230 [13:03:57<1:35:38, 27.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7095, 'learning_rate': 3.728323699421965e-06, 'epoch': 9.05} + 90%|██████████████████████████████████████████████████████████████████▉ | 2018/2230 [13:03:57<1:35:38, 27.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▉ | 2018/2230 [13:03:57<1:35:38, 27.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▉ | 2018/2230 [13:03:57<1:35:38, 27.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▉ | 2018/2230 [13:03:57<1:35:38, 27.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▉ | 2018/2230 [13:03:57<1:35:38, 27.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▉ | 2018/2230 [13:03:57<1:35:38, 27.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▉ | 2018/2230 [13:03:57<1:35:38, 27.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▉ | 2018/2230 [13:03:57<1:35:38, 27.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▉ | 2018/2230 [13:03:57<1:35:38, 27.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▉ | 2018/2230 [13:03:57<1:35:38, 27.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▉ | 2018/2230 [13:03:57<1:35:38, 27.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5699, 'learning_rate': 3.6936416184971097e-06, 'epoch': 9.06} + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5183, 'learning_rate': 3.676300578034682e-06, 'epoch': 9.07} + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|█████████████████████████████████████████████████████���█████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5259, 'learning_rate': 3.6589595375722543e-06, 'epoch': 9.07} + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4793, 'learning_rate': 3.6416184971098264e-06, 'epoch': 9.08} + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████���████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4506, 'learning_rate': 3.624277456647399e-06, 'epoch': 9.08} + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:04:51<1:33:46, 26.79s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4071, 'learning_rate': 3.606936416184971e-06, 'epoch': 9.09} + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4044, 'learning_rate': 3.5895953757225434e-06, 'epoch': 9.09} + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.398, 'learning_rate': 3.5722543352601155e-06, 'epoch': 9.09} + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4986, 'learning_rate': 3.554913294797688e-06, 'epoch': 9.1} + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▏ | 2026/2230 [13:07:24<1:27:32, 25.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:11:21,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:11:21,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:11:21,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:11:21,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:11:21,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:11:21,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:11:21,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:09:04<1:23:33, 25.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:09:04<1:23:33, 25.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4046, 'learning_rate': 3.53757225433526e-06, 'epoch': 9.1} + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:09:04<1:23:33, 25.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:09:04<1:23:33, 25.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:09:04<1:23:33, 25.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:09:04<1:23:33, 25.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:09:04<1:23:33, 25.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:09:04<1:23:33, 25.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:09:04<1:23:33, 25.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:09:04<1:23:33, 25.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:09:04<1:23:33, 25.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:09:04<1:23:33, 25.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████��████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3652, 'learning_rate': 3.5202312138728325e-06, 'epoch': 9.11} + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4288, 'learning_rate': 3.5028901734104046e-06, 'epoch': 9.11} + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3382, 'learning_rate': 3.485549132947977e-06, 'epoch': 9.12} + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2031/2230 [13:09:28<1:22:26, 24.86s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2034/2230 [13:10:41<1:19:21, 24.29s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▍ | 2034/2230 [13:10:41<1:19:21, 24.29s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3078, 'learning_rate': 3.468208092485549e-06, 'epoch': 9.12} + 91%|███████████████████████████████████████████████████████████████████▍ | 2034/2230 [13:10:41<1:19:21, 24.29s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:13:19,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:13:19,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:13:19,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:13:19,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:13:19,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:13:19,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:13:19,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:13:19,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:13:19,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:13:19,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3217, 'learning_rate': 3.4508670520231212e-06, 'epoch': 9.13} +[WARNING|modeling_bart.py:1051] 2022-03-28 10:13:19,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:13:41,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:13:41,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:13:41,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:13:41,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:13:41,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:13:41,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:13:41,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:13:41,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:13:41,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:13:41,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:13:41,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2776, 'learning_rate': 3.4335260115606937e-06, 'epoch': 9.13} +[WARNING|modeling_utils.py:388] 2022-03-28 10:13:41,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:13:41,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:13:41,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:13:41,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:13:41,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:13:41,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:13:41,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:13:41,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:13:41,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▌ | 2037/2230 [13:11:51<1:15:46, 23.56s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▌ | 2037/2230 [13:11:51<1:15:46, 23.56s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3606, 'learning_rate': 3.416184971098266e-06, 'epoch': 9.13} + 91%|███████████████████████████████████████████████████████████████████▌ | 2037/2230 [13:11:51<1:15:46, 23.56s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|█████████████████��█████████████████████████████████████████████████▌ | 2037/2230 [13:11:51<1:15:46, 23.56s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▌ | 2037/2230 [13:11:51<1:15:46, 23.56s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▌ | 2037/2230 [13:11:51<1:15:46, 23.56s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▌ | 2037/2230 [13:11:51<1:15:46, 23.56s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▌ | 2037/2230 [13:11:51<1:15:46, 23.56s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▌ | 2037/2230 [13:11:51<1:15:46, 23.56s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:14:41,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:14:41,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:14:41,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:14:41,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3422, 'learning_rate': 3.3988439306358383e-06, 'epoch': 9.14} +[WARNING|modeling_utils.py:388] 2022-03-28 10:14:41,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:14:41,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:14:41,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:14:41,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:14:41,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:14:41,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:14:41,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:14:41,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:14:41,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:14:41,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:14:41,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3055, 'learning_rate': 3.3815028901734103e-06, 'epoch': 9.14} +[WARNING|modeling_utils.py:388] 2022-03-28 10:14:41,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:14:41,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:14:41,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:14:41,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:14:41,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:14:41,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:14:41,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:14:41,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:14:41,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:15:30,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:15:30,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2051, 'learning_rate': 3.364161849710983e-06, 'epoch': 9.15} +[WARNING|modeling_utils.py:388] 2022-03-28 10:15:30,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:15:30,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:15:30,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:15:30,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:15:30,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:15:30,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:15:46,791 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:15:46,791 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:15:50,806 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:15:50,806 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:15:50,806 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2975, 'learning_rate': 3.346820809248555e-06, 'epoch': 9.15} +[WARNING|modeling_utils.py:388] 2022-03-28 10:15:50,806 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:15:58,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:15:58,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:15:58,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:15:58,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:15:58,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:16:09,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:16:09,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:16:13,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:16:13,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2273, 'learning_rate': 3.3294797687861274e-06, 'epoch': 9.16} +[WARNING|modeling_utils.py:388] 2022-03-28 10:16:13,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:16:13,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:16:21,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:16:21,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:16:21,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:16:21,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:16:21,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:16:31,712 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:16:31,712 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:16:31,712 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:16:31,712 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2341, 'learning_rate': 3.3121387283236995e-06, 'epoch': 9.16} +[WARNING|modeling_utils.py:388] 2022-03-28 10:16:31,712 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:16:31,712 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:16:44,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:16:44,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:16:44,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:16:44,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:16:44,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:16:54,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:16:54,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2985, 'learning_rate': 3.294797687861272e-06, 'epoch': 9.17} +[WARNING|modeling_utils.py:388] 2022-03-28 10:16:54,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:17:00,681 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:17:00,681 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:17:00,681 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:17:06,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:17:06,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:17:06,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:17:12,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:17:12,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:17:12,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.278, 'learning_rate': 3.277456647398844e-06, 'epoch': 9.17} +[WARNING|modeling_utils.py:388] 2022-03-28 10:17:12,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:17:20,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:17:20,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:17:24,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:17:24,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:17:28,899 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:17:28,899 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:17:32,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:17:32,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2753, 'learning_rate': 3.260115606936416e-06, 'epoch': 9.17} +[WARNING|modeling_bart.py:1051] 2022-03-28 10:17:37,182 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:17:37,182 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:17:41,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:17:43,310 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:17:43,310 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:17:47,440 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:17:47,440 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:17:51,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:17:51,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:17:53,489 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:17:53,489 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:17:57,441 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:17:59,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:18:01,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:18:03,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:18:03,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:07,319 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:09,365 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:09,365 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:11,501 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:13,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:15,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:17,435 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:19,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:21,352 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:23,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:25,150 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:25,150 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:27,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:28,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:30,795 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:32,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:34,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:37,962 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:37,962 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:37,962 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:40,530 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:44,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:45,843 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:47,514 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:49,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:50,859 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:54,150 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:54,150 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:55,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:57,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:18:58,935 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:01,999 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:03,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:06,354 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:06,354 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:07,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:09,219 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:11,887 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:14,507 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:15,792 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:15,792 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:18,418 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:19,656 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:22,078 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:24,392 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:26,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:26,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:27,930 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:30,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:32,153 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:34,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:34,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:36,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:38,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:40,729 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:42,498 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:42,498 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:43,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:46,695 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:48,955 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:49,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:49,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2751, 'learning_rate': 3.0693641618497114e-06, 'epoch': 9.22} +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:53,590 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:53,590 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:19:57,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:00,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:00,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:04,520 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:04,520 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:08,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:08,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:11,660 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:15,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:15,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:18,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:18,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1168, 'learning_rate': 3.0520231213872834e-06, 'epoch': 9.23} +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:22,398 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:22,398 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:25,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:29,555 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:29,555 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:33,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:33,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:36,598 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:36,598 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:40,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:43,680 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:43,680 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:47,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:47,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9596, 'learning_rate': 3.0346820809248555e-06, 'epoch': 9.23} +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:50,793 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:50,793 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:54,370 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:57,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:20:57,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:01,305 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:04,772 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:04,772 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:08,278 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:08,278 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:11,722 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:11,722 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:11,722 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:15,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:18,775 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:18,775 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:22,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:22,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:25,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:29,091 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:29,091 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:32,483 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:32,483 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:35,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0323, 'learning_rate': 3e-06, 'epoch': 9.24} +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0487, 'learning_rate': 2.9826589595375726e-06, 'epoch': 9.25} +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9915, 'learning_rate': 2.9653179190751446e-06, 'epoch': 9.25} +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8636, 'learning_rate': 2.947976878612717e-06, 'epoch': 9.26} +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9274, 'learning_rate': 2.930635838150289e-06, 'epoch': 9.26} +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8802, 'learning_rate': 2.9132947976878613e-06, 'epoch': 9.26} +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8013, 'learning_rate': 2.8959537572254333e-06, 'epoch': 9.27} +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8256, 'learning_rate': 2.878612716763006e-06, 'epoch': 9.27} +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.6916, 'learning_rate': 2.861271676300578e-06, 'epoch': 9.28} +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:21:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:23:14<1:10:32, 26.45s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:23:14<1:10:32, 26.45s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.6942, 'learning_rate': 2.8439306358381504e-06, 'epoch': 9.28} + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:23:14<1:10:32, 26.45s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:23:14<1:10:32, 26.45s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:23:14<1:10:32, 26.45s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:23:14<1:10:32, 26.45s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:23:14<1:10:32, 26.45s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:23:14<1:10:32, 26.45s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:23:14<1:10:32, 26.45s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:23:14<1:10:32, 26.45s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:23:14<1:10:32, 26.45s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:23:14<1:10:32, 26.45s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:23:14<1:10:32, 26.45s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:23:14<1:10:32, 26.45s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7135, 'learning_rate': 2.8265895953757224e-06, 'epoch': 9.29} + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:23:14<1:10:32, 26.45s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:23:14<1:10:32, 26.45s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:23:14<1:10:32, 26.45s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:23:14<1:10:32, 26.45s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:23:14<1:10:32, 26.45s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:23:14<1:10:32, 26.45s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:23:14<1:10:32, 26.45s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:23:14<1:10:32, 26.45s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:23:14<1:10:32, 26.45s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:23:14<1:10:32, 26.45s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:23:14<1:10:32, 26.45s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:24:05<1:08:38, 26.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:24:05<1:08:38, 26.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5903, 'learning_rate': 2.8092485549132945e-06, 'epoch': 9.29} + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:24:05<1:08:38, 26.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:24:05<1:08:38, 26.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:24:05<1:08:38, 26.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|██████████████████��█████████████████████████████████████████████████▊ | 2072/2230 [13:24:05<1:08:38, 26.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:24:05<1:08:38, 26.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:24:05<1:08:38, 26.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:24:05<1:08:38, 26.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:24:05<1:08:38, 26.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:24:05<1:08:38, 26.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:24:05<1:08:38, 26.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:24:05<1:08:38, 26.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:24:05<1:08:38, 26.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:24:05<1:08:38, 26.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5691, 'learning_rate': 2.791907514450867e-06, 'epoch': 9.3} + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:24:05<1:08:38, 26.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:24:05<1:08:38, 26.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████████████████████████████████████████████████���██▊ | 2072/2230 [13:24:05<1:08:38, 26.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:24:05<1:08:38, 26.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:24:05<1:08:38, 26.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:24:05<1:08:38, 26.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:24:05<1:08:38, 26.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:24:05<1:08:38, 26.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:24:05<1:08:38, 26.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:24:05<1:08:38, 26.07s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4745, 'learning_rate': 2.774566473988439e-06, 'epoch': 9.3} + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4411, 'learning_rate': 2.7572254335260116e-06, 'epoch': 9.3} + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4509, 'learning_rate': 2.7398843930635836e-06, 'epoch': 9.31} + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4391, 'learning_rate': 2.722543352601156e-06, 'epoch': 9.31} + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5249, 'learning_rate': 2.705202312138728e-06, 'epoch': 9.32} + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████���███████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4409, 'learning_rate': 2.6878612716763007e-06, 'epoch': 9.32} + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▊ | 2074/2230 [13:24:56<1:07:06, 25.81s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:29:42,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:29:42,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:29:42,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:29:42,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:29:42,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:29:42,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:29:42,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:29:42,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:29:42,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:29:42,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3896, 'learning_rate': 2.6705202312138728e-06, 'epoch': 9.33} +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4046, 'learning_rate': 2.6531791907514452e-06, 'epoch': 9.33} +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4891, 'learning_rate': 2.6358381502890173e-06, 'epoch': 9.34} +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4354, 'learning_rate': 2.61849710982659e-06, 'epoch': 9.34} +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:30:03,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████ | 2084/2230 [13:29:04<59:15, 24.35s/it] Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████ | 2084/2230 [13:29:04<59:15, 24.35s/it] Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3831, 'learning_rate': 2.601156069364162e-06, 'epoch': 9.35} + 93%|███████████████████████████████████████████████████████████████████████ | 2084/2230 [13:29:04<59:15, 24.35s/it] Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████ | 2084/2230 [13:29:04<59:15, 24.35s/it] Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████ | 2084/2230 [13:29:04<59:15, 24.35s/it] Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████ | 2084/2230 [13:29:04<59:15, 24.35s/it] Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████ | 2084/2230 [13:29:04<59:15, 24.35s/it] Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████ | 2084/2230 [13:29:04<59:15, 24.35s/it] Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████ | 2084/2230 [13:29:04<59:15, 24.35s/it] Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████ | 2084/2230 [13:29:04<59:15, 24.35s/it] Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████ | 2084/2230 [13:29:04<59:15, 24.35s/it] Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████ | 2084/2230 [13:29:04<59:15, 24.35s/it] Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████ | 2084/2230 [13:29:04<59:15, 24.35s/it] Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████��██████████████████████████████████████████████████████ | 2084/2230 [13:29:04<59:15, 24.35s/it] Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3975, 'learning_rate': 2.583815028901734e-06, 'epoch': 9.35} + 93%|███████████████████████████████████████████████████████████████████████ | 2084/2230 [13:29:04<59:15, 24.35s/it] Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████ | 2084/2230 [13:29:04<59:15, 24.35s/it] Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████ | 2084/2230 [13:29:04<59:15, 24.35s/it] Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████ | 2084/2230 [13:29:04<59:15, 24.35s/it] Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████ | 2084/2230 [13:29:04<59:15, 24.35s/it] Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████ | 2084/2230 [13:29:04<59:15, 24.35s/it] Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████ | 2084/2230 [13:29:04<59:15, 24.35s/it] Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████ | 2084/2230 [13:29:04<59:15, 24.35s/it] Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:32:20,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:32:20,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:32:20,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3543, 'learning_rate': 2.5664739884393064e-06, 'epoch': 9.35} +[WARNING|modeling_utils.py:388] 2022-03-28 10:32:20,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:32:20,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:32:20,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:32:20,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:32:20,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:32:20,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:32:20,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:32:20,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:32:20,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:32:20,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:32:20,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:32:20,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3374, 'learning_rate': 2.5491329479768785e-06, 'epoch': 9.36} +[WARNING|modeling_utils.py:388] 2022-03-28 10:32:20,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:32:20,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:32:20,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:32:20,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:32:20,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:32:20,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:32:20,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:32:20,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:32:20,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:32:20,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▏ | 2088/2230 [13:30:38<56:03, 23.68s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▏ | 2088/2230 [13:30:38<56:03, 23.68s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▏ | 2088/2230 [13:30:38<56:03, 23.68s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▏ | 2088/2230 [13:30:38<56:03, 23.68s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▏ | 2088/2230 [13:30:38<56:03, 23.68s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▏ | 2088/2230 [13:30:38<56:03, 23.68s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▏ | 2088/2230 [13:30:38<56:03, 23.68s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▏ | 2088/2230 [13:30:38<56:03, 23.68s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▏ | 2088/2230 [13:30:38<56:03, 23.68s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▏ | 2088/2230 [13:30:38<56:03, 23.68s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▏ | 2088/2230 [13:30:38<56:03, 23.68s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▏ | 2089/2230 [13:31:00<54:54, 23.36s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▏ | 2089/2230 [13:31:00<54:54, 23.36s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.239, 'learning_rate': 2.514450867052023e-06, 'epoch': 9.37} + 94%|███████████████████████████████████████████████████████████████████████▏ | 2089/2230 [13:31:00<54:54, 23.36s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▏ | 2089/2230 [13:31:00<54:54, 23.36s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▏ | 2089/2230 [13:31:00<54:54, 23.36s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▏ | 2089/2230 [13:31:00<54:54, 23.36s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▏ | 2089/2230 [13:31:00<54:54, 23.36s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:33:46,774 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:33:46,774 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:33:46,774 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:33:46,774 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:33:46,774 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:33:46,774 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3719, 'learning_rate': 2.4971098265895955e-06, 'epoch': 9.37} +[WARNING|modeling_bart.py:1051] 2022-03-28 10:33:46,774 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:33:46,774 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:33:46,774 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:33:46,774 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:33:46,774 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:34:09,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:34:09,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:34:13,142 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:34:13,142 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:34:13,142 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:34:13,142 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3023, 'learning_rate': 2.4797687861271676e-06, 'epoch': 9.38} +[WARNING|modeling_utils.py:388] 2022-03-28 10:34:13,142 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:34:13,142 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:34:13,142 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:34:13,142 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:34:13,142 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:34:13,142 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:34:13,142 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:34:35,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:34:35,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:34:35,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:34:39,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:34:39,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:34:39,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:34:39,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:34:39,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:34:39,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:34:39,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:34:54,380 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:34:54,380 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▎ | 2093/2230 [13:32:27<50:02, 21.91s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▎ | 2093/2230 [13:32:27<50:02, 21.91s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2331, 'learning_rate': 2.445086705202312e-06, 'epoch': 9.39} + 94%|███████████████████████████████████████████████████████████████████████▎ | 2093/2230 [13:32:27<50:02, 21.91s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▎ | 2093/2230 [13:32:27<50:02, 21.91s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▎ | 2093/2230 [13:32:27<50:02, 21.91s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:35:08,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:35:08,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:35:08,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:35:08,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:35:08,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:35:08,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:35:08,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2789, 'learning_rate': 2.4277456647398847e-06, 'epoch': 9.39} +[WARNING|modeling_utils.py:388] 2022-03-28 10:35:08,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:35:08,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:35:27,177 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:35:27,177 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:35:27,177 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:35:33,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:35:33,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:35:33,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:35:39,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:35:39,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2706, 'learning_rate': 2.4104046242774567e-06, 'epoch': 9.39} +[WARNING|modeling_bart.py:1051] 2022-03-28 10:35:39,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:35:39,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:35:47,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:35:47,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:35:51,876 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:35:51,876 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:35:55,902 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:35:55,902 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▍ | 2096/2230 [13:33:28<46:20, 20.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▍ | 2096/2230 [13:33:28<46:20, 20.75s/it]g-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:36:01,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:36:01,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:36:06,164 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:36:06,164 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:36:10,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:36:12,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:36:12,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:36:12,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:36:18,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:36:18,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2796, 'learning_rate': 2.3757225433526013e-06, 'epoch': 9.4} +[WARNING|modeling_bart.py:1051] 2022-03-28 10:36:22,264 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:36:22,264 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:36:26,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:36:28,212 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:36:30,390 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:36:32,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:36:32,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:00:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▌ | 2098/2230 [13:34:05<42:43, 19.42s/it][WARNING|modeling_bart.py:1051] 2022-03-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████████████████████████████████████▌ | 2098/2230 [13:34:05<42:43, 19.42s/it][WARNING|modeling_bart.py:1051] 2022-03-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:36:38,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:36:40,860 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:36:42,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:36:42,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:36:46,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:36:48,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:36:50,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:36:52,503 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:36:52,503 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:36:54,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:36:56,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:36:58,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:00,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:02,266 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:04,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:06,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:08,684 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:08,684 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:10,677 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:12,545 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:14,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:16,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:17,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:19,659 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:21,405 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:21,405 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:24,909 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:26,556 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:28,165 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:29,753 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:32,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:34,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:35,876 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:35,876 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:38,916 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:41,722 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:43,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:45,807 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:47,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:47,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:49,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:51,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:53,480 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:55,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:55,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:57,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:37:59,336 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:01,496 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:03,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:03,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:05,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:07,691 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:09,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:11,347 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:11,347 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:13,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:15,830 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:17,516 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:19,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:19,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:20,550 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:20,550 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:24,410 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:28,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:28,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:31,812 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:31,812 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:35,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:35,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:38,990 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:42,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:42,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:46,154 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:46,154 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:46,154 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:49,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:49,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:53,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:53,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:38:56,945 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:00,472 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:00,472 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:04,035 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:07,534 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:07,534 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:11,036 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:11,036 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:14,502 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:14,502 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:14,502 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:18,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:18,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:21,607 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:25,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:25,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:28,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:28,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:32,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:35,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:35,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:39,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:39,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:42,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:45,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:45,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0563, 'learning_rate': 2.1502890173410407e-06, 'epoch': 9.46} +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:49,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:49,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:52,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:52,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:56,288 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9696, 'learning_rate': 2.1329479768786128e-06, 'epoch': 9.47} +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9416, 'learning_rate': 2.1156069364161853e-06, 'epoch': 9.47} +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0014, 'learning_rate': 2.0982658959537573e-06, 'epoch': 9.48} +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:39:59,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2114/2230 [13:39:05<50:41, 26.22s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2114/2230 [13:39:05<50:41, 26.22s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0486, 'learning_rate': 2.0809248554913294e-06, 'epoch': 9.48} + 95%|████████████████████████████████████████████████████████████████████████ | 2114/2230 [13:39:05<50:41, 26.22s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2114/2230 [13:39:05<50:41, 26.22s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2114/2230 [13:39:05<50:41, 26.22s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2114/2230 [13:39:05<50:41, 26.22s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2114/2230 [13:39:05<50:41, 26.22s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2114/2230 [13:39:05<50:41, 26.22s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2114/2230 [13:39:05<50:41, 26.22s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2114/2230 [13:39:05<50:41, 26.22s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2114/2230 [13:39:05<50:41, 26.22s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2114/2230 [13:39:05<50:41, 26.22s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2114/2230 [13:39:05<50:41, 26.22s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8973, 'learning_rate': 2.0635838150289015e-06, 'epoch': 9.48} + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8664, 'learning_rate': 2.046242774566474e-06, 'epoch': 9.49} + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|███████████████���████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8263, 'learning_rate': 2.028901734104046e-06, 'epoch': 9.49} + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|█████████████████████████████████████��██████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7801, 'learning_rate': 2.0115606936416185e-06, 'epoch': 9.5} + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7214, 'learning_rate': 1.9942196531791906e-06, 'epoch': 9.5} + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████ | 2115/2230 [13:39:32<50:39, 26.43s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.623, 'learning_rate': 1.976878612716763e-06, 'epoch': 9.51} + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.6586, 'learning_rate': 1.959537572254335e-06, 'epoch': 9.51} + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5181, 'learning_rate': 1.9421965317919072e-06, 'epoch': 9.52} + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4883, 'learning_rate': 1.9248554913294797e-06, 'epoch': 9.52} + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.6009, 'learning_rate': 1.9075144508670518e-06, 'epoch': 9.52} + 95%|████████████████████████████████████████████████████████████████████████▎ | 2120/2230 [13:41:45<48:29, 26.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5483, 'learning_rate': 1.8901734104046243e-06, 'epoch': 9.53} +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4663, 'learning_rate': 1.8728323699421966e-06, 'epoch': 9.53} +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:46:03,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.424, 'learning_rate': 1.8554913294797688e-06, 'epoch': 9.54} + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4261, 'learning_rate': 1.838150289017341e-06, 'epoch': 9.54} + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|██████████████████████████████████████████████████████████████████████���█▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4097, 'learning_rate': 1.8208092485549132e-06, 'epoch': 9.55} + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4119, 'learning_rate': 1.8034682080924855e-06, 'epoch': 9.55} + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3827, 'learning_rate': 1.7861271676300577e-06, 'epoch': 9.56} + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▍ | 2127/2230 [13:44:43<43:41, 25.45s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:49:05,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:49:05,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:49:05,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:49:05,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:49:05,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:49:05,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:49:05,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:49:05,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4139, 'learning_rate': 1.76878612716763e-06, 'epoch': 9.56} +[WARNING|modeling_bart.py:1051] 2022-03-28 10:49:05,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:49:05,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:49:05,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:49:05,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:49:05,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:49:05,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:49:05,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:49:05,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:49:05,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:49:05,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▋ | 2133/2230 [13:47:11<39:42, 24.56s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▋ | 2133/2230 [13:47:11<39:42, 24.56s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4792, 'learning_rate': 1.7514450867052023e-06, 'epoch': 9.57} + 96%|████████████████████████████████████████████████████████████████████████▋ | 2133/2230 [13:47:11<39:42, 24.56s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▋ | 2133/2230 [13:47:11<39:42, 24.56s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▋ | 2133/2230 [13:47:11<39:42, 24.56s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▋ | 2133/2230 [13:47:11<39:42, 24.56s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▋ | 2133/2230 [13:47:11<39:42, 24.56s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▋ | 2133/2230 [13:47:11<39:42, 24.56s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▋ | 2133/2230 [13:47:11<39:42, 24.56s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▋ | 2133/2230 [13:47:11<39:42, 24.56s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▋ | 2133/2230 [13:47:11<39:42, 24.56s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▋ | 2133/2230 [13:47:11<39:42, 24.56s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▋ | 2133/2230 [13:47:11<39:42, 24.56s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▋ | 2133/2230 [13:47:11<39:42, 24.56s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3544, 'learning_rate': 1.7341040462427746e-06, 'epoch': 9.57} + 96%|████████████████████████████████████████████████████████████████████████▋ | 2133/2230 [13:47:11<39:42, 24.56s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:13,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:13,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:13,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:13,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:13,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:13,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:13,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:27,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:27,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3638, 'learning_rate': 1.7167630057803469e-06, 'epoch': 9.57} +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:27,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:27,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:27,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:27,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:27,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:27,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:27,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:27,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:27,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:27,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:27,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:27,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3551, 'learning_rate': 1.6994219653179191e-06, 'epoch': 9.58} +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:27,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:27,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:27,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:27,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:27,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:27,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:27,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:27,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:27,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:27,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:27,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3162, 'learning_rate': 1.6820809248554914e-06, 'epoch': 9.58} +[WARNING|modeling_utils.py:388] 2022-03-28 10:50:27,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:51:20,805 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:51:20,805 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:51:20,805 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:51:20,805 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:51:20,805 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:51:20,805 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:51:20,805 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:51:20,805 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:51:20,805 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▊ | 2138/2230 [13:49:08<36:13, 23.63s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▊ | 2138/2230 [13:49:08<36:13, 23.63s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3552, 'learning_rate': 1.6647398843930637e-06, 'epoch': 9.59} + 96%|████████████████████████████████████████████████████████████████████████▊ | 2138/2230 [13:49:08<36:13, 23.63s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▊ | 2138/2230 [13:49:08<36:13, 23.63s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▊ | 2138/2230 [13:49:08<36:13, 23.63s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▊ | 2138/2230 [13:49:08<36:13, 23.63s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▊ | 2138/2230 [13:49:08<36:13, 23.63s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▊ | 2138/2230 [13:49:08<36:13, 23.63s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▊ | 2138/2230 [13:49:08<36:13, 23.63s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▊ | 2138/2230 [13:49:08<36:13, 23.63s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▊ | 2138/2230 [13:49:08<36:13, 23.63s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▊ | 2138/2230 [13:49:08<36:13, 23.63s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▊ | 2138/2230 [13:49:08<36:13, 23.63s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2356, 'learning_rate': 1.647398843930636e-06, 'epoch': 9.59} +[WARNING|modeling_bart.py:1051] 2022-03-28 10:52:05,749 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:52:05,749 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:52:05,749 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:52:05,749 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:52:05,749 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:52:05,749 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:52:05,749 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:52:05,749 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:52:05,749 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:52:24,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:52:24,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2151, 'learning_rate': 1.630057803468208e-06, 'epoch': 9.6} +[WARNING|modeling_utils.py:388] 2022-03-28 10:52:24,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:52:24,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:52:24,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:52:24,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:52:24,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:52:38,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:52:38,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:52:42,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:52:42,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [13:50:15<33:42, 22.72s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [13:50:15<33:42, 22.72s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3076, 'learning_rate': 1.6127167630057803e-06, 'epoch': 9.6} + 96%|████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [13:50:15<33:42, 22.72s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [13:50:15<33:42, 22.72s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [13:50:15<33:42, 22.72s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|��███████████████████████████████████████████████████████████████████████▉ | 2141/2230 [13:50:15<33:42, 22.72s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [13:50:15<33:42, 22.72s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [13:50:15<33:42, 22.72s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [13:50:15<33:42, 22.72s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [13:50:15<33:42, 22.72s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [13:50:15<33:42, 22.72s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [13:50:15<33:42, 22.72s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [13:50:15<33:42, 22.72s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3143, 'learning_rate': 1.5953757225433526e-06, 'epoch': 9.61} + 96%|████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [13:50:15<33:42, 22.72s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [13:50:15<33:42, 22.72s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [13:50:15<33:42, 22.72s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [13:50:15<33:42, 22.72s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:53:21,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:53:21,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:53:21,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:53:21,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████ | 2143/2230 [13:50:58<31:52, 21.99s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████ | 2143/2230 [13:50:58<31:52, 21.99s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2887, 'learning_rate': 1.5780346820809249e-06, 'epoch': 9.61} + 96%|█████████████████████████████████████████████████████████████████████████ | 2143/2230 [13:50:58<31:52, 21.99s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:53:35,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:53:35,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:53:35,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:53:35,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:53:35,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:53:35,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:53:35,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:53:49,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:53:49,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2681, 'learning_rate': 1.5606936416184972e-06, 'epoch': 9.61} +[WARNING|modeling_bart.py:1051] 2022-03-28 10:53:49,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:53:55,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:53:55,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:53:55,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:54:01,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:54:01,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:54:01,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:54:08,097 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:54:08,097 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:54:08,097 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2485, 'learning_rate': 1.5433526011560694e-06, 'epoch': 9.62} +[WARNING|modeling_utils.py:388] 2022-03-28 10:54:14,311 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:54:14,311 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:54:14,311 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:54:20,431 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:54:20,431 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:54:20,431 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:54:26,525 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:54:26,525 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:54:26,525 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|██████████████████████████████████████████████���██████████████████████████▏ | 2146/2230 [13:51:59<29:07, 20.80s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:54:32,586 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:54:34,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:54:34,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:54:39,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:54:39,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:54:43,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:54:43,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:54:47,342 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▏ | 2147/2230 [13:52:18<27:56, 20.20s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▏ | 2147/2230 [13:52:18<27:56, 20.20s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:54:51,281 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:54:51,281 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:54:55,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:54:57,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:54:57,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:01,378 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:03,541 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:05,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:05,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3599, 'learning_rate': 1.4913294797687863e-06, 'epoch': 9.63} +[WARNING|modeling_bart.py:1051] 2022-03-28 10:55:09,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:55:11,779 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:55:13,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:55:15,982 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 10:55:15,982 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:19,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:21,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:23,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:23,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:25,572 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:27,551 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:29,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:31,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:33,322 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:35,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:37,107 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:39,998 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:39,998 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:42,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:43,807 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:45,607 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:47,370 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:49,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:50,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:54,250 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:54,250 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:56,023 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:57,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:55:59,298 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:02,448 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:04,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:05,583 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:08,694 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:08,694 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:10,203 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:11,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:14,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:15,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:18,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:18,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:19,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:22,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:23,827 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:26,249 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:28,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:28,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:29,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:32,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:34,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:36,313 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:38,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:38,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:40,313 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:42,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:44,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:44,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:45,848 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:48,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:50,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:52,197 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:52,197 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2417, 'learning_rate': 1.3352601156069364e-06, 'epoch': 9.67} +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:56,117 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:56,117 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:59,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:56:59,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:03,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:07,028 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:07,028 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:10,628 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:10,628 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:14,131 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:14,131 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:17,755 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:21,279 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:21,279 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9626, 'learning_rate': 1.3179190751445087e-06, 'epoch': 9.68} +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:24,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:24,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:28,398 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:28,398 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:31,962 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:35,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:35,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:39,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:39,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:42,527 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:46,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:46,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:46,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:49,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:49,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:53,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:53,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:57:56,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:00,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:00,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:03,546 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:03,546 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:07,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:10,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:10,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:13,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:13,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:13,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:17,378 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:17,378 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:20,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:24,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:24,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:27,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:27,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:27,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:27,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:27,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:27,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:27,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:27,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:27,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:27,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9245, 'learning_rate': 1.2658959537572255e-06, 'epoch': 9.69} +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:27,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:27,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:27,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:27,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:27,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:27,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:27,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:27,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:27,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:27,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 10:58:27,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2162/2230 [13:56:41<27:58, 24.69s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2162/2230 [13:56:41<27:58, 24.69s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8779, 'learning_rate': 1.2485549132947978e-06, 'epoch': 9.7} + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2162/2230 [13:56:41<27:58, 24.69s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2162/2230 [13:56:41<27:58, 24.69s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2162/2230 [13:56:41<27:58, 24.69s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2162/2230 [13:56:41<27:58, 24.69s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2162/2230 [13:56:41<27:58, 24.69s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2162/2230 [13:56:41<27:58, 24.69s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2162/2230 [13:56:41<27:58, 24.69s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2162/2230 [13:56:41<27:58, 24.69s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2162/2230 [13:56:41<27:58, 24.69s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2162/2230 [13:56:41<27:58, 24.69s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2162/2230 [13:56:41<27:58, 24.69s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2162/2230 [13:56:41<27:58, 24.69s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8989, 'learning_rate': 1.23121387283237e-06, 'epoch': 9.7} + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████��██████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8897, 'learning_rate': 1.2138728323699423e-06, 'epoch': 9.7} + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9576, 'learning_rate': 1.1965317919075146e-06, 'epoch': 9.71} + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8363, 'learning_rate': 1.1791907514450867e-06, 'epoch': 9.71} + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9311, 'learning_rate': 1.161849710982659e-06, 'epoch': 9.72} + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|███████████████████████████████���█████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7241, 'learning_rate': 1.1445086705202312e-06, 'epoch': 9.72} + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████���███████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7679, 'learning_rate': 1.1271676300578035e-06, 'epoch': 9.73} + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|███████████████████████████████████████████���█████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7678, 'learning_rate': 1.1098265895953758e-06, 'epoch': 9.73} + 97%|█████████████████████████████████████████████████���███████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▋ | 2163/2230 [13:57:09<28:50, 25.82s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [14:00:42<25:49, 26.26s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [14:00:42<25:49, 26.26s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5697, 'learning_rate': 1.092485549132948e-06, 'epoch': 9.74} + 97%|███████████████████████████████████████████████████████��█████████████████▉ | 2171/2230 [14:00:42<25:49, 26.26s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [14:00:42<25:49, 26.26s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [14:00:42<25:49, 26.26s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [14:00:42<25:49, 26.26s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [14:00:42<25:49, 26.26s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [14:00:42<25:49, 26.26s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [14:00:42<25:49, 26.26s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [14:00:42<25:49, 26.26s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [14:00:42<25:49, 26.26s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [14:00:42<25:49, 26.26s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [14:00:42<25:49, 26.26s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [14:00:42<25:49, 26.26s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.6388, 'learning_rate': 1.0751445086705204e-06, 'epoch': 9.74} + 97%|█████████████████████████████████████████████████████████████��███████████▉ | 2171/2230 [14:00:42<25:49, 26.26s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [14:00:42<25:49, 26.26s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [14:00:42<25:49, 26.26s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [14:00:42<25:49, 26.26s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [14:00:42<25:49, 26.26s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [14:00:42<25:49, 26.26s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [14:00:42<25:49, 26.26s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [14:00:42<25:49, 26.26s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [14:00:42<25:49, 26.26s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [14:00:42<25:49, 26.26s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|█████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [14:00:42<25:49, 26.26s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4868, 'learning_rate': 1.0578034682080926e-06, 'epoch': 9.74} + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5181, 'learning_rate': 1.0404624277456647e-06, 'epoch': 9.75} + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5205, 'learning_rate': 1.023121387283237e-06, 'epoch': 9.75} + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5082, 'learning_rate': 1.0057803468208093e-06, 'epoch': 9.76} + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4763, 'learning_rate': 9.884393063583815e-07, 'epoch': 9.76} + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4069, 'learning_rate': 9.710982658959536e-07, 'epoch': 9.77} + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4645, 'learning_rate': 9.537572254335259e-07, 'epoch': 9.77} + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████ | 2173/2230 [14:01:33<24:35, 25.88s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3022, 'learning_rate': 9.364161849710983e-07, 'epoch': 9.78} +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.44, 'learning_rate': 9.190751445086705e-07, 'epoch': 9.78} +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3826, 'learning_rate': 9.017341040462427e-07, 'epoch': 9.78} +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4476, 'learning_rate': 8.84393063583815e-07, 'epoch': 9.79} +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:06:48,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:08:32,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:08:32,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:08:32,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3724, 'learning_rate': 8.670520231213873e-07, 'epoch': 9.79} +[WARNING|modeling_utils.py:388] 2022-03-28 11:08:32,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:08:32,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:08:32,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:08:32,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:08:32,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:08:32,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:08:32,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:08:32,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:08:32,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:08:57,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:08:57,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3177, 'learning_rate': 8.497109826589596e-07, 'epoch': 9.8} +[WARNING|modeling_utils.py:388] 2022-03-28 11:08:57,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:08:57,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:08:57,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:08:57,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:08:57,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:08:57,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:08:57,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:08:57,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:08:57,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:08:57,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:06:51<17:23, 23.72s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:06:51<17:23, 23.72s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3174, 'learning_rate': 8.323699421965318e-07, 'epoch': 9.8} + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:06:51<17:23, 23.72s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:06:51<17:23, 23.72s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:06:51<17:23, 23.72s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████���██████████████████████████████████████████████████████████▌ | 2186/2230 [14:06:51<17:23, 23.72s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:06:51<17:23, 23.72s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:06:51<17:23, 23.72s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:06:51<17:23, 23.72s/it]g-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:09:40,896 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:09:40,896 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:09:40,896 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:09:40,896 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.399, 'learning_rate': 8.15028901734104e-07, 'epoch': 9.81} +[WARNING|modeling_utils.py:388] 2022-03-28 11:09:40,896 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:09:50,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:09:50,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:09:50,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:09:50,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:09:50,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:10:01,391 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:10:01,391 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:10:01,391 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:10:01,391 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2188/2230 [14:07:38<16:29, 23.55s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2188/2230 [14:07:38<16:29, 23.55s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.289, 'learning_rate': 7.976878612716763e-07, 'epoch': 9.81} + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2188/2230 [14:07:38<16:29, 23.55s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2188/2230 [14:07:38<16:29, 23.55s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2188/2230 [14:07:38<16:29, 23.55s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2188/2230 [14:07:38<16:29, 23.55s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2188/2230 [14:07:38<16:29, 23.55s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2188/2230 [14:07:38<16:29, 23.55s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2188/2230 [14:07:38<16:29, 23.55s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2188/2230 [14:07:38<16:29, 23.55s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2188/2230 [14:07:38<16:29, 23.55s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████���██████████████████████████████████████████████████████████████▌ | 2189/2230 [14:08:00<15:53, 23.25s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [14:08:00<15:53, 23.25s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2944, 'learning_rate': 7.803468208092486e-07, 'epoch': 9.82} + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [14:08:00<15:53, 23.25s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [14:08:00<15:53, 23.25s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [14:08:00<15:53, 23.25s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [14:08:00<15:53, 23.25s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [14:08:00<15:53, 23.25s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [14:08:00<15:53, 23.25s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [14:08:00<15:53, 23.25s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [14:08:00<15:53, 23.25s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [14:08:00<15:53, 23.25s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▌ | 2189/2230 [14:08:00<15:53, 23.25s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|█████████��████████████████████████████████████████████████████████████████▋ | 2190/2230 [14:08:22<15:18, 22.97s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▋ | 2190/2230 [14:08:22<15:18, 22.97s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|██████████████████████████████████████████████████████████████████████████▋ | 2190/2230 [14:08:22<15:18, 22.97s/it] Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:11:00,560 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:11:00,560 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:11:04,589 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:11:04,589 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:11:08,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:11:08,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:11:08,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:11:08,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:11:08,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2808, 'learning_rate': 7.456647398843931e-07, 'epoch': 9.83} +[WARNING|modeling_utils.py:388] 2022-03-28 11:11:08,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:11:20,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:11:20,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:11:20,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:11:20,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:11:20,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:11:31,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:11:31,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:11:35,212 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:11:35,212 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3136, 'learning_rate': 7.283236994219653e-07, 'epoch': 9.83} +[WARNING|modeling_utils.py:388] 2022-03-28 11:11:35,212 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:11:35,212 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:11:35,212 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:11:35,212 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:11:47,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:11:47,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:11:51,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:11:51,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:11:51,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:11:57,541 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:11:57,541 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:12:00,203 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:12:00,203 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:12:00,203 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:12:00,203 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:12:00,203 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:12:10,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:12:10,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:12:10,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:12:16,498 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:12:16,498 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:12:16,498 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:12:16,498 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2466, 'learning_rate': 6.936416184971098e-07, 'epoch': 9.84} +[WARNING|modeling_bart.py:1051] 2022-03-28 11:12:25,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:12:25,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:12:28,738 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:12:31,227 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:12:31,227 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:12:31,227 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:12:31,227 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:12:31,227 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:12:41,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:12:41,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.246, 'learning_rate': 6.76300578034682e-07, 'epoch': 9.84} +[WARNING|modeling_bart.py:1051] 2022-03-28 11:12:41,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:12:41,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:12:48,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:12:48,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:12:53,250 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:12:53,250 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:12:57,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:12:57,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:12:57,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2324, 'learning_rate': 6.589595375722543e-07, 'epoch': 9.85} +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:03,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:03,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:13:07,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:13:07,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:11,431 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:13,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:13,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:13,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 10:36:36,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|█████████████████████████████████████████████████████████████���████████████▉ | 2197/2230 [14:10:46<11:01, 20.05s/it][WARNING|modeling_bart.py:1051] 2022-03-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|██████████████████████████████████████████████████████████████████████████▉ | 2197/2230 [14:10:46<11:01, 20.05s/it][WARNING|modeling_bart.py:1051] 2022-03-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:21,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:23,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:23,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:23,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:29,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:31,537 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:33,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:33,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:35,814 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:37,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:39,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:41,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:43,881 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:45,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:47,853 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:49,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:49,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:51,839 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:53,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:55,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:57,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:13:59,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:03,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:04,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:04,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:04,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:09,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:11,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:12,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:14,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:16,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:17,795 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:21,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:21,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:22,661 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:24,244 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:27,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:28,857 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:30,330 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:33,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:33,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:34,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:37,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:38,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:41,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:42,767 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:42,767 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:45,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:46,684 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:49,097 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:51,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:53,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:53,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:54,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:57,068 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:14:59,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:01,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:01,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:03,100 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:05,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:07,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:09,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:09,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:10,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:13,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:15,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:15,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2673, 'learning_rate': 4.6820809248554914e-07, 'epoch': 9.9} +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:18,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:18,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:22,041 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:25,733 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:25,733 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:29,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:29,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:32,894 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:32,894 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:36,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:39,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:39,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:47,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:47,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:50,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:54,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:54,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:57,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:15:57,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:00,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:04,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:04,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:07,723 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:07,723 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:11,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:14,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:14,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8833, 'learning_rate': 4.3352601156069365e-07, 'epoch': 9.91} +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:18,055 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:18,055 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:21,525 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:24,913 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:24,913 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:28,311 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:28,311 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:31,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:35,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:35,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:38,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:38,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0039, 'learning_rate': 4.161849710982659e-07, 'epoch': 9.91} +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:41,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:45,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:45,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:48,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:48,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:52,027 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:55,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:55,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:58,791 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:16:58,791 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:02,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9315, 'learning_rate': 3.9884393063583815e-07, 'epoch': 9.91} +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7539, 'learning_rate': 3.8150289017341043e-07, 'epoch': 9.92} +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.6727, 'learning_rate': 3.6416184971098266e-07, 'epoch': 9.92} +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5854, 'learning_rate': 3.468208092485549e-07, 'epoch': 9.93} +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4982, 'learning_rate': 3.2947976878612716e-07, 'epoch': 9.93} +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.513, 'learning_rate': 3.1213872832369944e-07, 'epoch': 9.94} +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:17:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.401, 'learning_rate': 2.7745664739884395e-07, 'epoch': 9.95} +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.441, 'learning_rate': 2.601156069364162e-07, 'epoch': 9.95} +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:19:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|███████████████████████████████████████████████████████████████████████████▋| 2220/2230 [14:18:24<04:05, 24.56s/it]g-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|███████████████████████████████████████████████████████████████████████████▋| 2220/2230 [14:18:24<04:05, 24.56s/it]g-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4043, 'learning_rate': 2.427745664739884e-07, 'epoch': 9.96} +100%|███████████████████████████████████████████████████████████████████████████▋| 2220/2230 [14:18:24<04:05, 24.56s/it]g-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|███████████████████████████████████████████████████████████████████████████▋| 2220/2230 [14:18:24<04:05, 24.56s/it]g-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|██████████████████████████████████████████████████████████████���████████████▋| 2220/2230 [14:18:24<04:05, 24.56s/it]g-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|███████████████████████████████████████████████████████████████████████████▋| 2220/2230 [14:18:24<04:05, 24.56s/it]g-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|███████████████████████████████████████████████████████████████████████████▋| 2220/2230 [14:18:24<04:05, 24.56s/it]g-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|███████████████████████████████████████████████████████████████████████████▋| 2220/2230 [14:18:24<04:05, 24.56s/it]g-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|███████████████████████████████████████████████████████████████████████████▋| 2220/2230 [14:18:24<04:05, 24.56s/it]g-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|███████████████████████████████████████████████████████████████████████████▋| 2220/2230 [14:18:24<04:05, 24.56s/it]g-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|███████████████████████████████████████████████████████████████████████████▋| 2220/2230 [14:18:24<04:05, 24.56s/it]g-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|███████████████████████████████████████████████████████████████████████████▋| 2221/2230 [14:18:47<03:36, 24.09s/it]g-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|███████████████████████████████████████████████████████████████████████████▋| 2221/2230 [14:18:47<03:36, 24.09s/it]g-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3952, 'learning_rate': 2.2543352601156068e-07, 'epoch': 9.96} +[WARNING|modeling_utils.py:388] 2022-03-28 11:21:22,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:21:22,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:21:22,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:21:22,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:21:22,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:21:22,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:21:22,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:21:36,346 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:21:36,346 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:21:36,346 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:21:40,475 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:21:40,475 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:21:44,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:21:44,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:21:48,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:21:48,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:21:48,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:21:48,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:21:48,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:21:48,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:21:48,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:21:48,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2441, 'learning_rate': 1.9075144508670522e-07, 'epoch': 9.97} +[WARNING|modeling_utils.py:388] 2022-03-28 11:22:04,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:22:04,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:22:04,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:22:04,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:22:13,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:22:13,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:22:13,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:22:13,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:22:21,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:22:21,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.316, 'learning_rate': 1.7341040462427744e-07, 'epoch': 9.97} +[WARNING|modeling_utils.py:388] 2022-03-28 11:22:21,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:22:27,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:22:27,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:22:27,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:22:33,754 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:22:33,754 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:22:33,754 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:22:33,754 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:22:41,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:22:41,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2267, 'learning_rate': 1.5606936416184972e-07, 'epoch': 9.98} +[WARNING|modeling_utils.py:388] 2022-03-28 11:22:45,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:22:47,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:22:47,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:22:51,791 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:22:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:22:56,264 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:22:58,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:22:58,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 11:22:58,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:23:02,131 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:23:04,222 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:23:06,312 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:23:08,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:23:10,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:23:12,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:23:13,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:23:15,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:23:15,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:23:17,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:23:19,353 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:23:22,628 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:23:24,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:23:25,749 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:23:28,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:23:28,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:23:30,259 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:23:31,633 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:23:34,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:23:35,507 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:23:37,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:23:37,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:23:40,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:23:42,034 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:23:43,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:23:46,534 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 11:23:46,534 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.1966, 'learning_rate': 6.936416184971099e-08, 'epoch': 10.0} +100%|████████████████████████████████████████████████████████████████████████████| 2230/2230 [14:21:16<00:00, 23.17s/it]g-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|feature_extraction_utils.py:324] 2022-03-28 11:23:59,340 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|feature_extraction_utils.py:324] 2022-03-28 11:24:11,634 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|feature_extraction_utils.py:324] 2022-03-28 11:24:11,634 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|feature_extraction_utils.py:324] 2022-03-28 11:24:11,634 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...